THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

(founded by h. c carves) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XII 


1941 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


EDITED BY 

S. S. WILKS, Editor 

A. T. CRAIG J- NKYMAN 


H. C. Carver 
H. Gramme 
W. E Deming 
G. Darmois 


WITH THE COOPERATION OF 

R. A. Fisher 
T. 0. Fry 
II. Hotelling 


R. DE Mises 
R S. Pearson 
H. L. Ri etc 
VV, A. Shew hart 


Manuscripts for publication in the Annals of Mathematical Statistics 
should bo sent to S. S. Wilks, Fine Hall, Princeton, New Jersey, Manuscript* 
should be typewritten double-spaced with wide margins, tuid the original copy 
should bo submitted. Footnotes should be reduced to a minimum and whenever 
possible 1 oplaecd by a bibliography at the end of (lie paper; formulae in foot¬ 
notes should be avoided. Figures, charts, ami diagrams should lie dmirai on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors arc requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 

Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free Additional reprints and covers furnished at cost. 

The subscription price for the Annals is $4 00 per year. Single copies $1.25, 
Back numbers are available at the following rates: 

Vols. I-IV $5.00 each. Single numbers $1.50. 

Yols. V to date $4.00 each Single numbers $1.25. 

Subscriptions, renewals, orders for back numbers and other business com- 
nuiniration.s should be sent to A. T. Craig, University of Iowa, Iowa City, Iowft, 

The Annals of Mathematical Statistics is published quarterly by the, 
Institute of Mathematical Statistics. 


Composed and Printed at tub 
WAVERLY PRESS, Inc. 
Baltimore, Md,, U. S. A. 



CONTENTS OF VOLUME XII 

Articles 

Aroian, Leo A. Continued Fractions for the Incomplete Beta Function.. 218 
Aroian, Leo A. A Study of R. A. Fisher’s 2 -Distribution and the Related 
F-Distribution ... . ... 429 

Beale, Frank S. On a Certain Class of Orthogonal Polynomials . ., , 97 

Bellinson, H. R., yon Neumann, J., Kent, R. H., and Hart, B. I. The 

Mean Square Successive Difference., . ... 153 

Brookner, R. J. and Wald, A On the Distribution of Wilks’ Statistic 
for Testing the Independence of Several Groups of Variates , 137 

Chung, Kai Lai. On the Probability of the Occurrence of at Least m 

Events Among n Arbitrary Events . ... 328 

Curtiss, J. H. On the Distribution of the Quotient of Two Chance 
Variables .... , ... . 409 

Dodd, Edward L. Some Generalizations of the Logarithmic Mean and 
of Similar Means of Two Variates Which Become Indeterminate 

When the Two Variates arc Equal .422 

Dodd, Edward L. The Cyclic Effects of Linear Graduations Persisting 

in the Differences of the Graduated Values .127 

Doob, J. L. and von Mises, R. Discussion of Papers on Probability 

Theory. 215 

Doob, J. L. Probability as Measure ,. ., , . 206 

Dressel, Paul L A Symmetric Method for Obtaining Unbiased Esti¬ 
mates and Expected Values , . , .... ... 84 

Dwyer, Paul S. The Doolittle Technique . . .449 

Dwyer, P. S. The Skewness of the Residuals in Linear Regression 

Theory .... , , . , . , 104 

Feller, Willy. On the Integral Equation of Renewal Theory, 243 

Gumbel, E. J. The Return Period of Flood Flows. , . , . 1G3 

Hart, B. I., von Neumann, J., Kent, R. H., and Bellinson, H. R. The 

Mean Square Successive Difference. 153 

Hotelling, PIarold. Experimental Determination of the Maximum of 

a Function. 20 

Hsu, Chung Tsi. Samples from Two Bivariate Normal Populations. , 279 
Kent, R. H., von Neumann, J., Bellinson, H. R., and Hart, B. I The 

Mean Square Successive Difference. . 153 

McPherson, J. C. On Mechanical Tabulation of Polynomials . 317 

Mood, A. M. On the Joint Distribution of the Medians in Samples from 
a Multivariate Population,. . . .... 268 


111 



IV 


CONTENTS OE VOLUME XII 


Neyman, J. On a Statistical Problem Arising in Routine Analyses and m 

Sampling Inspections of Mass Production .. ,, . 46 

Paulson, Edward, On Certain Likelihood-Ratio Tests Associated with 

the Exponential Distribution.301 

Satterthwaite, Franklin E. A Concise Analysis of Certain Algebraic 

Forms .. 77 

Tuckerman, L, B. On the Mathematically Significant Figures in the 

Solution of Simultaneous Linear Equations.307 

von Mises, R. and Doob, J. L. Discussion of Papers on Probability 

Theory, . . , ■. ■ . 215 

von Mises, R. On the Foundations of Probability and Statistics ... 191 

von Neumann, John. Distribution of the Ratio of the Mean Successive 

' Difference to the Variance. . 367 

von Neumann, J., Kent, R. H., Bellinson, H R., and Hart, B. I. The 

Mean Square Successive Difference , . . 153 

Wald, Abraham Asymptotically Most Powerful Tests of Statistical 

Hypotheses. , 1 

Wald, Abraham Some Examples of Asymptotically Most Powerful 

Tests,,. , . .396 

Wald, A. and Brookner, R. J. On the Distribution of Wilks’ Statistic 

for Testing the Independence of Several Groups of Variates.137 

Wilks, S. S. On the Determination of Sample Sizes for Setting Tolerance 

Limits , . . . 91 

Young, L C On Randomness in Ordered Sequences.293 

Notes 

Baker, G. A. Tests of Homogeneity for Normal Populations.233 

Craig, Cecil C. A Note on Sheppard’s Correction .. 339 

Craig, Cecil C. Note on the Distribution of Non-central t with an Appli¬ 
cation . ,. . . 224 

Daly, Joseph F. A Problem in Estimation, ... .459 

Gordon, Robert D. The Estimation of a Quotient when the Denomi¬ 
nator is Normally Distributed., , . 115 

Gordon, Robert D, Values of Mills’ Ratio of Area to Bounding Ordinate 
and of the Normal Probability Integral for Large Values of the Argu¬ 
ment. . , . 364 

Greville, T, N. E. The Frequency Distribution of a General Matching 

Problem. 350 

Hoel, Paul G. On Methods of Solving Normal Equations.364 

Kavanagh, Arthur J. Note on the Adjustment of Observations. , .111 
Kendall, M G Corrections to a Paper on the Uniqueness Problem of 

Moments . 464 

Kolmogorofe, A, Confidence Limits for an Unknown Distribution 
Function . . • 





CONTENTS OP VOLUME XII 


V 


Hosteller. Frederick Note on an Application of Runs to Quality 

Control Charts .... ... . . 228 

Samuelson, Paul A. Conditions that the Roots of a Polynomial be Less 

Than Unity in Absolute Value. . '. .360 

Stewart, W Mac A Note on the Power of the Sign Test.236 

Wald, Abraham On the Analysis of Variance in Case of Multiple Clas¬ 
sifications with Unequal Class Frequencies . , , . , 346 

Wald, A. and Wolfowit2, J. Note on Confidence Limits for Continuous 

Distribution Functions . .. ...118 

Williams, J. D. Moments of the Ratio of the Mean Square Successive 
Difference to the Mean Square Difference in Samples from a Normal 

Universe . . . . . . 239 

Wolpowitz, J. and Wald, A. Note on Confidence Limits for Continuous 


Distribution Functions ... . . 118 

Miscellaneous 

Abstracts of Papers . ... 123, 470 

Announcement Concerning Computation of Mathematical Tables. .., 465 

Constitution and By-Laws of the Institute...474 

Directory of the Institute . .478 

Report of the Chicago Meeting of the Institute (Dec. 1940). 120 

Report of the Chicago Meeting of the Institute (Sept. 1941). 468 










liaaisiiii 






















ASYMPTOTICALLY MOST POWERFUL TESTS OF STATISTICAL 

HYPOTHESES 1 

By Abraham Wald 1 

Columbia University, New York City 

1 . Introduction. Let f(x, 0 ) be the probability density function of a variate 
x involving an unknown parameter 0 . For testing the hypothesis 0 = Vo by 
means of n independent observations , • • • , x n on x we have to choose a region 
of rejection W n in the n-dimensional sample space. Denote by P(W n | 0 ) the 
probability that the sample point E = (x 1} , x„) will fall in W n under the 

assumption that 0 is the true value of the parameter. For any region U n of 
the n-dimensional sample space denote by g(U n ) the greatest lower bound of 
P(U„ | 0). For any pair of regions U n and T n denote by L(U n , T n ) the least 
upper bound of 

P(U» | 0) - P(T n | 0). 

In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n. 

Definition 1. A sequence {W„), (n = 1 , 2 , • • • , ad inf.), of regions is said to 
be an asymptotically most powerful test of the hypothesis 0 = 0 O on the level of 
significance « if P(W n ] 0 O ) = a and if for any sequence ( Z n \ of regions for 
which P(Z n | 0 O ) = «, the inequality 

lim sup L(Z„, W*) <0 

rt-*o0 


holds. 

Definition 2. A sequence {Wn}, (n = 1, 2, ..., ad inf.), of regions is said 
to be an asymptotically most powerful unbiased test of the hypothesis 0 = 0 O 
on the level of significance a if P(TF n | 0o) = lim 0 (TF„) = a, and if for any Be- 

rtbnoo 

quence {Z n \ of regions for which P{Z n | 0 O ) = lim g(Z n ) = a, the inequality 

n“oo 

lim sup L{Z„, W„) < 0 

TV “*CQ 


holds. 

Let 6 n (xi, • • • , x„) be the maximum likelihood estimate of 0 in the n-dimen¬ 
sional sample space. That is to say, 9 n (x i, ■ • • , x n ) denotes the value of 0 


1 Presented to the American Mathematical Society at New York, February 24, 1940. 
* Research under a grant-in-aid from the Carnegie Corporation of New York. 

1 



2 


ABRAHAM WALD 


for which the product II/(x v , 6) becomes a maximum. Let W‘ n be the region 

v™l 

defined by the inequality Vn(6 n - 8 0 ) > c'„ , W" defined by the inequality 
y/H(d n - do) < c", and let W n consists of all points for which at least one of 
the inequalities 

\/n{K — do) > On I n — Bo) < “ On 

is satisfied. The constants a n , c' n , c" are chosen such that 

P(WL I Bo) = P(F" I Bo) = P(W n I 6o) - «. 

It will be shown in this paper that under certain restrictions on the probability 
density/(x, 6) the sequence {F'„} is an asymptotically most powerful test of the 
hypothesis 9 = if 8 takes only values 8 > 6 0 . Similarly [W"} is an asymp¬ 
totically most powerful test if 9 takes only values 6 < Bo. Finally ( W„ j is an 
asymptotically most powerful unbiased test if S can take any real value. 

2. Assumptions on the density function f(x, 6). 

Assumption 1. For any positive k 

‘ , lim P(— k < d n — 6 < k\9) ■= 1 

T»"-OQ 

uniformly in 6, where P(—k<& n — 6<k\8) denotes the probability that — k< 
&n — 9 < k under the assumption that 6 is the true value of the parameter. 
Assumption 1 implies somewhat more than consistency of the maximum like¬ 
lihood estimate K ■ In fact, consistency means only that for any positive k 

lim P(—k < — 0 < k\0) = 1, 

n—oo 

without asking that the convergence should be uniform in 6, If satisfies 
' Assumption 1 we shall say that is a uniformly consistent estimate of 0, A 
rigorous proof of the consistency of d n (under certain restrictions on f(x, 9 )) 
was given by J. L, Doob. 3 In an appendix to this paper it will be shown that 
under certain conditions is uniformly consistent. 

Denote by Eebk{x)] the expected value of ^(x) under the assumption that 9 
is the true value of the parameter. That is to say, 

= f ^{x)f(x,e)dx. 

For any x, for any positive 5, and for any 6 i, denote by pi{x, 6i , 5) the greatest 

lower bound, and by ^(x, 6i , 5) the least upper bound of - in the 

. ’ 6B 3 

interval 6i - $ < 9 < 6i + 5, 

Assumption 2. There exists o positive value ko such that the expectations 
E«<Pi{x, 0i,8) and,E)ifii{x, 6 lt 8) exist and are continuous functions of 8, 0 X and S 


1 J. L. Doob, '"‘Probability and statistics/’ Trans . Am . Math 8 oc „ Vol. 36 (1937). 



TESTS OF STATISTICAL HYPOTHESES 


3 


in the domain D defined by the inequalities : 0 < 8 < , 0q — |/c 0 < 0i < 

So + , 6q — k 0 < 8 < do + ko. Furthermore the expectations Et[<pi(%, S\, 5)] 2 

and Ei[<pi(x, 6i , 5)] 2 exist in D and have a finite upper bound in D, 

Assumption 3. There exists a positive value ko such that 

jC dX = L dx = ° f ° r + 

Assumption 3 means simply that we may differentiate with respect to 6 under 
the integral sign. In fact 

f fix, 0) dx = 1 

J—QO 

identically in 6. Hence 

Jj( x > dx = ^ dx = °- 


d_ 

36 


Differentiating under the integral sign, we obtain the relations in Assumption 3. 
Assumption 4 There exists a positive rj and a positive k a such that 

i diog/M ) r 

38 


E e 


exists and has a finite upper bound in the interval 8 0 — h < 0 < 8 0 + ko . 

3. Some propositions. Denote yfn (0„ — 9) by 2„(0) and denote the proba¬ 
bility P[z n (0) < 1 1 0] by 4> n (<, 0). 

Proposition I. Within the 8-interval [0o — hh, 9 0 4- -p 0 ] '*>„(<, 0) converges 
with n —* °° uniformly in t and 6 towards the cumulative normal distribution with 
zero mean and variance 

i / tp 3* log f(x, 8) 

- 1 /* - W - 

Proof: In all that follows we assume that 0 takes only values in the interval 
[0o — ko, 0o + *o], except when the contrary is explicitly stated. Furthermore 
we introduce the variable 8 1 and assume that 0i takes only values in the, interval 

[0o — §ko, 0o + §ka] ■ 

Because of Assumption 3 we have 
(!) = £*&«*- 0 
Since 

0 s log/(x,0)_ 1 3'f(x, 8) 1 [dfix, 0)7 

30 s fix, 8)' 002 [fix, 9)}* l 38 J 


we get from Assumption 3 


( 2 ) 


F plog/(a:, 0)T _ _ 0 2 log/(a: J 0) 

* L 00 J 00® 



4 


ABRAHAM WALD 


3 2 l og/(s, 6) 

dff> 


> 0. 


Hence 

(3) d(d) = -E, 

Consider the Taylor expansion 

(4) £ * log/(*«.*) - £ 8 log /(«■, ft) + ( 0 _ 0j) £ 3 2 log/Qc,,, S') 


a-1 


de 


&g* 


where S' lies in the interval [Si, 3]. Denote E by y n (S 1 ). 

For e = $ n the left hand side of (4) is equal to zero. Hence we have 

(5) yJh) + h/^» - «i)l i E ^ lQ -# -■ " « 0, 

71 a OP 

or 

( 6 ) 




y«(0x) + *,(*> r E = 0. 


n 

Let <2„(0i) be the region defined by the inequality 

a2 


(7) 


n o oo 11 


< r 


where p denotes a positive number less than the greatest lower bound of d(0i). 
We shall prove that 

(8) lim PKWft) | 0.] = 1 

n—oo 

uniformly in Si. Let ro be a positive number such that 


(9) 




<*. 


(i = 1, 2) 


for all values of fit. Because of Assumption 2 such a r 0 certainly exists. 
Denote by R„(9i ) the region defined by the inequality 

(10) | — Ol I < TO. 

On account of Assumption 1 

(Hi' limPBWWW = 1 

n*»oo 

uniformly in Si. Since 0' lies in the interval [0j , § n ], we have 

< 12 ) | 0' - Si! < TO 

for all points in R„(Si). Hence at any point in R n (6,) the inequality 

<13) £ «-<*■ ■ ■ -> s ± < ± 

holds. 



TESTS OF STATISTICAL HYPOTHESES 


5 


Let S n ( 6 1 ) be defined by the inequality 


(14) 


- 22 <Pl(.Xa , Ol i To) — E Bl ¥>l(x, Ol , To) 
n a 


and T n (8i ) by the inequality 


(15) 


- 22 tpi{x a , Ol , To) — E 9l <p 2 (x, Ol , To) 
71 a 


<1- 


<5- 


On account of Assumption 2 we have 

(16) lien P[S n (0i) | di] = lim P[!F„(<h) W = 1 

n“=oc n“oo 

uniformly in Q\. 

Denote by U n (6i) the common part of the regions R n (6i), S n (0i) and T n (6{). 
In U n (0i) we have on account of (9), (14) and (15) 


(17) 




< v {i = 1 , 2 ). 


From this we obtain (7) because of (13). That is to say, the inequality (7) is 
valid everywhere in U n (d 1 ). Since 

lim P[U n (0 1 ) | *] - 1 

n—oe 

uniformly in $ t , our statement about Q„(6i) is proved. From ( 6 ) and (7) we 
get that everywhere in Q n ( 8 1 ) the inequalities hold: 


(18) 

(19) 


Vn(6l) < zM) < 


d(di) + v 

Vn{0l) 

d(d i) + v 


> Zn {Ol) > 


d{8i) — v 

y n {Qi) 

d(0i) — v 


if y»( 0 i) > 0 ; 

if Un{0\) < 0 . 


Let z n (6i) be defined as follows: z*(0i) = z„(0i) at any point in Q n (Qi), and 
z n (Oi) = y n (0i)/d(8i) at any point outside Q n {<h)- 
On account of ( 8 ) we obviously have 


( 20 ) 


lim P[z*(0i) < 1 1 0i] - P[z„(0i) < 1 10j = 0 


uniformly in t and $i . 

From equation ( 1 ) it follows that E' Bl y n (6i) = 0. From Assumption 4 it follows 
on account of the general limit theorems 

(21) lim P[y n (0i) < t\9i] - - 7 == f ‘ e*" 1 *™ dt = 0 

\z Zird{Bi) J_oo 

uniformly in t and 0 i * Hence 


,“2 < ‘i® 1 ] - j/W c 


dt=0 



6 


AD HA HAM WALD 


* 


SSoX’*“ 1,0 chosra lr '"' rarilJ - *—■««<-, 

(22) l~l p [3®ir < < ii«j - o 

. u irTf^” 1 f<,1Ws r ™ m < 2I > “»« <m. 

°° d let F.W i, 0, , J„ wZt^£2r° Uh '"' U - m ’’ 1 *> " 

(^n — 00 ) \/n < z. 

Let U n (z ) be the intersection of V (z\ nnrJ xv , 

Denote furthermore P[W n | 0 O + ^ G^ n ) ^f 1 ** % 

anrf */ tel /i„ = M , Men J ^ n) ‘ ■V “»(*) ^werpea to 


(23) JS ff(M " ’ n3 = / B <#*(«) 

where 

c = -1 / S? f 0o) 

/ »• aF— 

I hoop: First we show 

(24) r® 

- «. 

Denote ,P[F n ( 2 ) I 5 0 ] by $ M nl 

mifomly to the eumohtive homalTtribw- 1 ’™^! 110 " 1 *<« wnv,,*, 
variance c. It is obvioue that d “ to ^t,on «*) with * ro meM J 

(25) 

Hence 

(26) 

From (25) we get 
(27) 


r.M - AW s »„w - *. w Ior 

FM ~ r ' M 5 *M - *M for r, > c,. 


Hence 
(28) 

s « m £ „ ttd ,he ref „rj 0 ( ^^ m 

l ) < a, we get from (28) 


Hence 
' (29) 


F(t) = a. 



TESTS OP STATISTICAL HYPOTHESES 


7 


Since F n (z) < $„(z), we have F(z) < ^(z), and therefore 
(30) lim F(z) = 0. 

x™—oo 

The equation (24) follows from (29) and (30). 

It follows easily from (26) that the integral on the right hand side of the equa¬ 
tion (23) exists and is finite. 

Let us denote 0 O + g „/\/n by 6 n . Consider the Taylor expansions 


(31) 


and 


(32) 


X log /(at, , 0o) = X , &*) + (6o ~ e n ) X ^ log f(x a , 0„) 

a a a w 

+ K^o - K) 2 X ~ 2 log f(x a ; e' n ) 
X log fix a , 0n) = X log 'f(x a ,D + (fi B ~ $») X ~ log f(x a , K) 

a a a uv 

+ h(0n - L) 2 X ^ log f(x a , 0") 


where el lies in the interval [0 O , K] and 0* lies in the interval [0 n , 0„], Since 
d n is the maximum likelihood estimate, we get from (31) and (32) 

(33) X log/(*., 0o) = X log/(x a , 0„) + i(0o - L) 2 X ~ log f(x a , 0'), 

a a a W 

2 ^ 

(34) X log/(at,, 0„) = X log/(*«, 6*) + Mn - X log/(x„, O. 

a or a VP 

Denote by /3 a real variable which can take any value between —2g and -f 2g. 
Denote by ii n the region defined by the inequality 

(35) | 0n - 0o | < n~\ 

From Proposition 1 it follows easily that 

(36) lim P(R n [ 0o + P/Vii) = 1 


uniformly in /3. Denote 2n~* by r n . Then for almost all n the following 
inequalities hold at any point in R n : 


(37) (PlOVot ) Tn) ^ TTi 106/(iCa} ^ X) j ^0 ; 

a or C/v* a 

d 2 

(38) <^l(x« , 00, T n ) ^ ^2 log/(Xa, 0n) ^ <^o(Xa , 00, Vn). 

a a Ow a 

Denote by tS„ the region in which (35), (37) and (38) simultaneously hold. It 
is obvious that 


lim P(S„ 10o + P /V») = 1 



8 


ABRAHAM WALD 


uniformly in (3. Denote 6 a + (3/\/n by B n ( j9). From Assumption 2 it follows 
easily that 

‘Pii-t'a i So i r«)| 


(39) 


lim E Sn (fj) j — 


n 


= -®»o ^- 2 log/(a:, 9o) ~ -{i ~~ 1,2) 


uniformly in /3. Furthermore the variance of E <p j i 3: . a ' if $ n (jj) j 8 the 

a 71 

true value of the parameter 9, converges to zero with n —* » uniformly in 0. 
Hence a sequence (A n |, (n = 1 , 2 , , ad inf.), of positive numbers can be 

given such that 

(40) lim A„ = 0 
and 

(41) lim P[T„ | 9 n m = 1 

uniformly in /3, where the region T n is defined by the inequality 


(42) 


1 <Pi(.%a , 00 , T n ) | 1 
i - T — 

n c 


< Ann 4 


d «» 1 , 2 ). 


From (37) and (38) it follows that in the intersection T' n of T n and S n 

* _ . 

< Kn 


(43) 
and 

(44) 


n ? dd i log ^ Xc “ 6 ^ + 7 


n ? dfP- e ") + - 


< A„?T*. 


i 

n j 


We get from (33), (34), (35), (43) and (44) that at any point in T' n 

(45) E log A*-, O - E log /(*., flo) = [( 0 O - 0 „) s _ (#„ „ d n ) 3 ] + x 

where | A„ | < pA„, and p denotes a constant not depending on n 
On account of (36) and (41) we have 

(46) lim P[T r „ 19„(/3)] = 1 

n™>oo 

uniformly in (3., 

Denote by T"{z) the intersection of U n (z) (defined in Proposition 21 anrl r' 
Denote furthermore P[T"(z) [ e 0 ] by F*(z) ' pos tl0n and T » • 

Since " ' 

n[( 0 o - K) - ( 6 n - $„) s ] = n[( 0 o - 3„) 2 _ (* 0 _ $ n + 

= + 2 Vn m„( 0 » - e a ), 



TESTS OP STATISTICAL HYPOTHESES 


g 


we get from (45) and (46) 

(47) Um IpIT'Xz) \B n \ - f dF* n (t)\ = 0 

uniformly in z. It is obvious that 

(48) lim \P[T"{z) 10 n ] - P[U n (z) | flj} - 0 

PI O!*30 

uniformly in z. Hence we get from (47) 

(4il) lirn iP[l\{z) Sfljj- f e-^-^dFUo] = 0 

uniformly in z. It Follows from (40) that for any positive L 

(50) lim Ip[IML) 1 e.) - P[U n (-L) \ d„] - f e****** dF^®) = 0. 

n* 1 *!# \ J—L J 

Since lim p„ = p, lim [Pttt) — F»(0] = 0 uniformly in l, and since lim F n (t) — 

«**■*«) n™oo 

F(t) uniformly in t, we get from (50) 

(51) lim {P[t;*(D 10.) - P[U n (-L) 10 n ]} = f e-^-^dFit). 

pi ***-/*} L 

Now let us calculate the limit of P[V n (z) | 0„] if n —> «>. The region F„(z) is 
defined by the inequality 

(52) (6 n - So) < «• 

This inequality can be written as follows: 

(53) (6 n — 6 n ) Vn < * “ M« • 

Since lim p n =» p-, we get on account of Proposition 1 

i r *—^ 

lim P[0 n — OVn < z - v* |0») = ~ / dt 


= _l= r 

■\Z2ttc 


-Kl-jO’/o, 


Hence 


lim P[V n (z) |0„] - -4== f e 

n-xc V 2 ?rc * L -“ 


Jt*—»)*/«, 


uniformly in a. 

For any positive « let JO, denote the positive number satisfying the condition: 
(59> 



10 


ABRAHAM WALD 


From (56) we easily get on account of (26) 

(57) 0 < J B~* <l *-**f dF(t) - e^'-W’dFit) < 

fc—eo J— L * 2 

Since the region U n (z 2 ) - U n (zf) is a subset of V u <*) - V n (zA for z 2 > Zl 
we have on account of ( 55 ) and (56) ' 

(58) limsup | {P[U„(co) \ 6n ] - P[U n (L t )\e n ] + P[ Un (-L,)\O n )) j < !. 

• 2 

Since 


I 0n\ = G(p, , n), 

we have 

(59) I ft) - [P\U n (L.) I - P[U n (—L,) I e n ]} I < 

2 

From (51), (57) and (59) we get 


(60) 


lim sup 

n~*<c 


G(ji n ,n) - f e dF(t) 

*'—00 ' / 


< e. 


I 

Since . can be chosen arbitrarily small, Proposition 2 is proved. 

4. Theorems on asymptotically most powerful tests. 

hborem 1: Let M n be the region defined by the inequality Vn (b - A 
where A n is chosen such that P(M n \ 0 B ) = a Then M \ i. o) “ A *’ 

most powerful test of the hypothesis ft - ft , is an asymptotically 

to values > tf fl . * ™ ^ 9 = tf “ * reab-fcj 

Proof: Assume that there exists a test { W n } of size a such that 


(61) 


lim sup L(W n , M„) = fi > o. 

fl —*oq 


ofth6 ^™ w im 

The expression 

^ ~ 6 ») -y/n = u n , > o , 

« - “n 0 " 0 * 8 ; S “” the Mmmption 

r? “6 Wf variate, tto aeqdC uTm™t k°T“‘ d “l ributio " witt 
is defined by the inequality q ' ^ must be bounded. Hence M„ 

(64) 


4,/Vft = «„ 



TESTS OF STATISTICAL HYPOTHESES 


11 


where 

(65) lim e„ = 0. 

*!***•,« 

From Assumption 1, i P> 1) and f05) it follows easily that if 
limflv -■*!>*, lim P(M„- j 6 n >) = 1. 

S’** 1 


Hence cm amnmt of ?B2) we must have 


( 00 ) 


lim 0„‘ ~ $o. 


If there would exist a miUm qtienre jn*} of {n'j such that lim p„> = to, then 

fl“BO 

on account of and Proposition 1 we would have lim P(M„. | 0„.) = 1, 

Tt*“« 

which is in contradiction to d!2). Hence the expression (63) must be bounded. 
Let |n''j hr a sul*mpu*iire of \n"\ such that 


(67) 


lim n„>> « n > 0. 

at****) 


Denote by FJz) the probability of the intersection of W n and the region 
($„ “ s under the hyjmtliesis that 6 * 8a . Consider the subse¬ 

quence |n"'( of the mjtierifT DP'S such that fV"(z) converges with n —> » 
towards a function Ft*). The existence of such a subsequence (n'"} can be 
proved as follows: Denote the probability /'[($„ — 0#)Vn < z\0 0 ] by 3> n (z). 
On account of Proposition 1, 4*,U) converges with »—► w uniformly in z towards 


( 68 ) 


<f(z) 


\ 

V’Sjtc 


f' 


“’"dt 


where c has the same value in (23). 

We obviously have 

(60) P.(z*) - F„(*,) < *,(*») ~ 

for any pair of values a t , e* for which zj > si. Hence 
(70) lim sup lF»ih) - F„(z *)} ^ ^(%) - 

Since, F»(«) » a monotonie function of a, our statement follows easily from (70) 
and the fact that #(z) w uniformly continuous. Hence on account of Proposi¬ 
tion 2 we have 


(71) 

Urn mv» I M - 

r C D(*»- *#»)/« dFW 


1 


and 



(72) * 

lim 3 *.»•») - 

* 

r d$(z) 



12 


AB1UHAM WALD 


where 

( 73 ) H z ) = 0 for z < Zo , 

(74) Hz) = Hz) - Hz o) for s > eo, 
and 2 0 is given by 

(75) 1 — Hza) — «• 

From (62), (71) and (72) we get 

(76) ^ d[F(z) - $<z)] = 5 > 0. 

**—CD 


Consider a normally distributed variate y with mean v and variance c. Let B 
be a critical region of size a for testing the hypothesis v — 0 by a single observation 
on y, i.e. B is a subset of the real axis [— », + »]. Denote by D(v) the inter¬ 
section of B and the region C(v) defined by the inequality y < w. Denote by 
H(v) the probability of D(v) under the hypothesis v = 0 . Then the power of 
the test B with respect to the alternative v = jn is given by the following ex¬ 
pression 

(77) ^ dH(v). 

J— KC 


If the region B is given by the inequality y > v a where t>o is chosen such that the 
size of £ is equal to a, then H(v) = $(a) where the function <£ is defined by the 
equations (73), (74) and (75). Since the latter test is uniformly moat powerful 4 
with respect to all alternatives v > 0 , for any positive g the inequality 

(78) [ e ^-^ 10 d{H(v) - $(v)] g 0 

holds. Let 


It is obvious that 


Hv) = 



g-ilVa 


dt. 


H(vt) — H(v i) < 'p(vi) — \p(vi) for vt > n 
and 


( 80 ) 



dH(v ) = 


a. 


(S)“ bu “°“ ,o “* ‘ h ~' y °' i,, “ os 



TESTS OP STATISTICAL HYPOTHESES 


13 


On. the other hand, if JC ( v ) is a monotonically non-decreasing non-negative func¬ 
tion of v such that 

(79') K(th) — K(vi) < <p(vi) — V'(ri) for r 2 > Vi 

and 

(800 f dK(v) = a 

oO 

hold, then there exists a sequence J# 10 }, (i = 1 , 2, • ■ • , ad inf.), of regions of 
size a such that 

lim H w (v) = K(v) 

uniformly in v. Since (78) holds for H(v) = and since 

H M (yi) — // c,) (vi) < \p(v 2 ) — for Vi > vi, 

it is easy to see that (78) will hold also for H(v) = K(v), Hence for any mono- 
tonically non-dccrcasing non-negative function K(v) for which (790 and (800 
are fulfilled, also (78) must hold. Since F(v ) is a distribution function which 
satisfies (790 and (800, we have a contradiction to (76). This proves Theorem 1. 
* Theorem 2: Let M„ be the region defined by the inequality \/n ($„ — 9 a ) < A n , 
where A K is chosen such that P(M n | 0o) = «• Then {M„} is an asymptotically 
most powerful test of the hypothesis 6 = 0 O , provided that the parameter 6 is restricted 
to values < do. 

We omit the proof since it is entirely analogous to that of Theorem 1. 
Theorem 3: Let M n be the region consisting of all points which satisfy at least 
one of the inequalities 

■\/n(J) n — do) < —A n , Vn (0* — ®o) > A n . 

The constant A„ > 0 is chosen such that P(M n | 9 0 ) = a. Then {M n } is an 
asymptotically most powerful unbiased test 'of the hypothesis 6 = 9 a . 

Proof: Assume that there exists a sequence |F„) (n = 1 , 2, • ■ • , ad inf.) 
of regions such that 

(81) P(W n | 8 0 ) = « 

(82) lim g(W n ) = oc 

IV" 00 

and 

(83) lim sup L(W n> M n ) = 8 > 0. 

»“*BO 

We shall deduce a contradiction from this assumption. On account of (83) 
there exists a subsequence ( n '} of {n} such that 

(84) lim {P(Wn- I e n .) - P(M n . | <?„«)) = 5. 



14 


ABRAHAM WARD 


The expression 

(85) (0i' — 00 )VV = Mb' 

must be bounded. The proof of this statement is omitted, since it is analogous 
to the proof of the similar statement about (63). Hence there exists a subse¬ 
quence [n") of {ft'} such that 

(86) lim Mn" 30 g. 


Denote by F„(z) the probability of the intersection of W* with the region 
(0 n - d a )\/n < z under the hypothesis 9 - 9 0 . Consider a subsequence f n r "\ 
of {a"} such that F n >>>(z) converges with n —> «> towards a function F(z). 
The existence of such a sequence \n'"\ can be proved in the same way us the 
similar statement in the proof of Theorem 1. Hence on account of Proposition 2 
and (86) we have 


(87) 

and 

limP(W„/// | 

0n'") = f 6^'^“ dF(z) 
j— 00 

(88) 

where 

limP(Jtf„/r- | 
1 n«”oo 

0b'") = f * d*(z) 

J—10 

(89) 

$(z) = -L* j 

V27rc j 

f e“ l * ,/c (it for z < — zo, 

(90) 

& 

V_y 

II 

T 

for — zo < 2 ^ z 0 

(91) 

and 

$(«) = $(-Zo)+ -jL= f'e^’dl for *> 

V 2xc Jg o 


$(-z«) = 

From (84), (87) and (88) it foUows that 


(93) 


£ 


— 2jii) ft 


d(F(z) — <J»(z)] = s. 


Consider a normally distributed variate y with means * and variance c. Let B 

^b8ervft? e n mtlCa - ° f size “ for testing tbe h yP othcai8 * « 0 by a single 

observation on y, ,.e. B ia a subset of the real axis [- - +„] fw 

n ^ e wi“ ° f S u With the regbn C(v) defb ^ ^ tW inequaHty V< ^ 

Denote by H(v) the probability of D(v) under the hypothesis , " n ThJn 
th» power of tb, test B with respect to (he alternativfr . “ hVven by 


(94) 


£ 


1ft*-2jie)ft 


dH(v). 



TESTS OF STATISTICAL HTPOTHESEB 


15 


If the region B consists of all points which satisfy at least one of the inequalities 
V < ~Vq , y > Vo , and if Vo > 0 is chosen such that the size of B is equal to a, 
then H(v) = $(t>), where $(u) is defined by the equations (89)-(92). Since the 
latter test is a uniformly most powerful unbiased test, 5 for any a the inequality 

(95) ^ d[H(v) - <J>(v)] < 0 

J_oc 

holds. Let 


It is obvious that 



-1= r 

V 2ir c 


dt. 


(96) HM — H(vj) < i//(u 2 ) — l/'fe) for v s > vi, 

(97) [ dH(v) = a 

J—oo 

and 

(98) f dH(v) has a minimum for a = 0 , 

*•»—OQ 

On the other hand, if K(v ) is a monotonically non-decreasing non-negative func¬ 
tion of v such that 


(960 -K(ui) ~ W ^ Hvt) ~ tfa) for Vi> Vi, 

(970 dK(v) = a , 

J—eo 

(980 [ dK(v) has a minimum for a = 0, 

.(SO 

then there exists a sequence {B {<) ) (i = 1, 2, • • • , ad inf.) of unbiased regions 
of size a such that 

lira H {i \v) = K(v) 

i—oo 


uniformly in v. Since (95) holds for H(v) = (i = 1, 2, • • • , ad inf.), 

and since 


V) - H u \v i) < +(v t ) - +(vi) for t >2 > «i, 


it is easy to see that (96) holds also for H(v) = K(v), Hence for any mono¬ 
tonically non-decreasing non-negative function K(v) for which (96'), (970, ftn( f 
(980 are fulfilled, also (96) must be fulfilled if we substitute K(v) for H(v). 


*. J. Neyman and E. 8. Pearson, 1. c., p, 29. 



16 


ABRAHAM WALD 


Since F(v) is a distribution function which satisfies (96'), (97') arid (98'), we 
have a contradiction to (93). This proves Theorem 3. 


6. Appendix. Proof of the uniform consistency of . It will be shown here 
that under certain conditions on the density function /{x, 0), Assumption 1, 
i.e. uniform consistency of S n , can be proved. 

For any open subset w of the 9-axis we denote by <p(x, w) the least upper 


bound, and by «) the greatest lower bound of 


a 3 log fix, s } 
dp 


with respect 


to 6 m the set«. For any function X(a:) we denote by J'JtX(x) the expected value 
of \(x) under the assumption that 0 is the true value of the parameter, i.e. 


Bt 



h(.x)f(x, 0) dx. 


Denote furthermore by P(6 n e u \ 9) the probability that b„ will fall in w under 
the assumption that 9 is the true value of the parameter. Finally denote by 0 
the parameter space and assume that 0 is either the whole real axis or a sub¬ 
set of it. 

Proposition 3, is a uniformly consistent estimate of 0, i.e. for any •positive k 
lim P(—k < d n — 6 < k \ 0) — 1 

n—og 

uniformly for all 0 in Cl, if the following two conditions are fulfilled : 

Condition I. For all values 9 in Cl 


Condition II. For any value 6 in 0 there exists an open interval os{6) containing ® 
and having the following three properties : 

, lim P($„ e«(0) I 0] = 1 

n™oo 

uniformly for all 9 in Cl. 

II b . E,</> [x, a>(0)] is a hounded function of Bin Cl, and the least upper bound A of 
E e <p[x, «(0)] with respect to 9 in 0 is negative. 

II C . Eetf x, u(0)] is a bounded function of 9 in the set Cl. 

Infaef*’” 1 meat “ ,impl)r lfcat w * may under the integral sign. 

[m-x 

identically in 0. Hence 


~ j> 0) dx = ± £j( x> g ) dx = o. 
Differentiating under the integral sign, we obtain Condition I. 



TESTS OF STATISTICAL HYPOTHESES 


17 


In case that. w(6) is the whole axis Condition II tt reduces to the condition 
that exists. 

In order to prove Proposition 3, we show first that for any positive r, 

(1)9) limflY-, <1 j <,')L1 - i 

L\ n a - 1 36 / I J 

uniformly for all 9 in Cl, We have on account of Condition I 
( 100 ) E, = E s j fix, d ) = £ dx = 0. 

Since 

f’Jsifel) . » [%»/,(*,»)] - 0 - {%«/[/(*,»]’}’ 


we have on account of Condition I 


'3 log fix 


*»)■--* 


a 2 log jXg, o) 
de* 


According to Condition II E^[x, u(ff)] < 0 and is a bounded function of 9- 

Since Et < 0 and > E 6 \p[x, o>(0)], the left hand side of (101), i.e. 

off* 

the variance of d ^ , is a bounded function of 9. Prom this and the 

39 

equation (100) wo obtain easily (99). Consider the Taylor expansion 

002) 1 s ajog^j) _ (i, _ u i z g j gfe. il > , 

n a 39 n a or 

where d' n lies in the interval [6, #„]. Let « be an arbitrary positive number and 
denote by QniO) the region defined by the inequality 


1 y d log fix*, 9) 
n« 39 


On account of (99) we have 


lim P[Q n (0) | 0] = 1 


uniformly for all 9 in Q. 

Denote by R n (0) the region defined by the inequality 
(106) - ^ <p[x« , «(d)] < < 0. 


On account of Condition II b 


lim P[RM ! 9] = 1 



18 


ABRAHAM WAUP 


uniformly for all 0 in SI. Denote by /?»{») the region in whwh *«(»), Skee 
in B n (8) 

i £ ?Li°* fe < ‘ £ *!r..«»»! 


we have in the intersection I?.{0) of R J6) and B*lS) 



Denote by U n {8) the intersection of Q„W and RUO). It h obvious that 

(108) UmPU/.fr)]*) - 1 

rt**» 

uniformly for all 8 in G. From (102), (103) and (107) we get that in 


m |# - , " |s ra;rur 

Hence on account of (108) 

0^ *» 1 

uniformly for all d in fi. Since e can bo chosen arbitrarily, Proposition 3 >8 
proved. 

Conditions I and II are sufficient but not necessary fur the uniform con¬ 
sistency of &„. For sufficiently small u(8) the condition* Ik ami If, are rather 
weak. In fact, on account of (101) we have 



a!-!:*&» <o. 

Hence for sufficiently small intervals <j( 8), under certain continuity conditions, 
also u (g)] will be negative. However, in some cases may be difficult to 
verify II a for small u(6). On the other hand, for sufficiently large w(0) (eer- 
taffiiy for 0,(0) = [- «, +«,]) n. oan easily be verified, but the conditions II b 
and li. might be unnecessarily strong. Iu cases where. II b or II. docs not hold 

r w l 00 j + 00 ] and the validity of II is not apparent, the following 

Lemma may be useful; 

Lemma; Proposition 3 remains valid if we substitute for Condition II the eon- 


II. Denote by T n the set of all points at which d n exists and 


( 110 ) 


Z^log/(x.,9*) « 0 


ta. « r. (to, to nr. |»] =! m a }ltnt 

r> f’ a It mch that /or „(J) _ ;(,) _ _ i s A- k\ the 

following twq conditions hold-. W K K * 0 + K) 



TESTS OF STATISTICAL HYPOTHESES 


19 


lib • Es<p [x, I(8)] is a bounded function of 0 in Q, and the least upper bound A 
of Ei&{x, 1(6 )] with respect to $ in £2 is negative. 

Hf • Ee\p[x t 1(9) ] is a bounded function of 9 %n the set £2. In cases where lib, 
or II,. is not fulfilled for «(0) = [- °o , + oo] the verification of II and II" may 
be easier than that of II. 

Our Lemma can be proved as follows: Consider the Taylor expansion 

(111) do log ^ Xa,6 * ) = \’ 2 k Xogi{ - Xa,6) + (r ~ e) i 2 w logf( - Xa ’ 6>) 

where O' lies in [0, 0*]. Denote by y n (0) the region defined by 

(112) is v [x a) 7(0)] < U < 0. 

71 

On account of II b we have 

(113) lim P[V„(9) | 0] = 1 

n—oo 

uniformly for all 9 in £2. Let W n (9) be the region defined by 

(1U) ilog f(x a ,e) <* 

n op 

From Condition I and Condition II" it follows easily that 
(115) lim F[WM I e] = 1 

n«>oo 

uniformly for all 0 in £2. For all values 9* in the interval 1(0) we have 

(118) - Z v [x a) 7(0)] > iz-jLlog/fo, 0'). 

n n ot)‘ 

Because of (112) and (116) we have in F„(0) 

(117) - S %= i°g f(x a , 9') < \A < 0 

for all values 0* in the interval 1(9) Let t be less than | ik.A |. Then in the 
intersection W' n (9) of the regions 7„(0) and F„(0) we obviously have on account 
of (114) that the values of the left hand side of (111) for 0* = 0 + k and 0* = 
6 - k will be of opposite sign. Hence at any point of W' n (0) the equation (110) 
lias at least one root which lies in the interval 1(6). Since (110) has at most 
one root in TV and since is a root of (110), we get that at any point of the 
intersection Wn(9) of W'„(6) and 2V, K lies in 7(0). Since 

(118) lim P[W"(9) | 0] = 1 uniformly for all 0 m £2, 

n**oo 

also 

(119) lim P[d n «7(0) 10] = 1 

TV —00 

uniformly for all 9 in £2. The relation (119) combined with the conditions II b 
and lit' is equivalent to Condition II. Hence our Lemma is proved. 



EXPERIMENTAL DETERMINATION OF THE MAXIMUM OF A 

FUNCTION 1 

By Harold Hotelling 
Columbia University, New York City 

1. The necessary background for efficient experimental determinations. We 
shall deal with the problem of arranging an experiment for determining the 
value of x for which an unknown function f(x) is a maximum or minimum. 
This problem is to be distinguished from those of estimating the maximum or 
minimum itself, and of studying the distributions of such estimates, problems 
to which Bernstein [1] and Rice [2] have contributed. 

The range of applications in which determinations of maximizing and mini¬ 
mizing values are important is extremely wide. Among these are the deter¬ 
mination of the time of year at which the number of algae or bacilli in a lake 
is a maximum, and the amount of fertilizers and of irrigation water making the 
yield of a crop a maximum. The magnetic permeabilities of permalloys, ]K*r- 
minvars and permendurs as functions of the induction, and the hardness of a 
copper-iron alloy as a function of the time of aging at 500°C., possess smooth 
maxima having interest in telephony, [3], [4]. The effective range of a gun is a 
function of the speed of burning of the powder, a variable which can be con¬ 
trolled. Almost every entrepreneur haa a fervent desire to know the selling 
prices that will yield a maximum profit, and a few have undertaken controlled 
experiments with a view to finding out. There arc also numerous practical 
problems of minimizing costs; for example, the cost of operating a ship as a 
function of its speed possesses a minimum. We shall confine our attention 
chiefly to the experimental determination of maxima, since such problems seem 
to occur naturally with greater frequency in applications; there is no low of 
generahty m this, since f(x) has a maximum where -/(*) has a minimum. 

We shall assume that, for each value of a: in the set we shall select, one or 

m "i? u made ™ V = /(I) ' “ d thlt are 

afflicted with error, which arc independently distributed about sen. will, a 

Wn tacZr J ' .? “ r ° U0 ' VS Um if «•) a linear function of 

rnXST: 5 rZ™ c ° l ' fflci<! ” 18 * - ft ■ • ■ • 1 Hr (for example 

nnZ, 'T 

SiC ble H ThX 8: r iS ''"■'YT th “ 0r not thec rrors an normally 

• IUte fourth moment of errors is Suite, end if the number N 

St.ti.tic sad the 

20 



MAXIMUM OF A FUNCTION 


21 


of observations is large, the estimated coefficients will be distributed in to 
approximately normal manner; and so also will any function of them that is 
regular in a fixed neighborhood of its "population value.” By the “population 
value” of a function <f>(b 0 , - ,b p ) we mean 4>{(3 0 , ,0 P ). In particular, if 

f(x) = /3a + fiix + fiiX* • ■. QjjX 11 

has a maximum for x - £ of the simplest type, such that/'(f) = 0 and/"(f) < 0, 
so that £ is a simple root of the equation 

/'(£) = ft + 2fr£ + ... + = 0, 

and if x 0 is an estimate of £ found from the polynomial fitted by the method of 
least squares, so that 

bi + 2b2Xo + • • • + pbpXo 1 = 0, 

this last equation defines *0 as a function of bi, * • • , b P . The function is, to 
be sure, multiple-valued when p > 2; but for sufficiently large values of N the 
probability will become arbitrarily great that the roots obtained from a random 
experiment will each differ by an arbitrarily small quantity from one of the roots 
of f' (x) = 0. Then provided we have a sufficient preliminary approximate knowl¬ 
edge of £, we may choose the root nearest £; and the probability distribution 
of this root, which in nearly all experiments will be a single-valued function 

<K&i 1 > b P ), 

will approach normality of form, with standard error of order N~ 112 , about a 
mean differing from 

£ = <t>(f3 1 , • • • > Pp) 

at most by terms of order JV -1 , which are thus negligible in comparison with the 
standard error. The situation will be effectively the same if, without knowing £ 
in advance even approximately, we choose the root x a giving the greatest value 
f(x 0 ), provided /(£) is greater than any other value of f(x ). 

From these considerations it appears advisable, whenever the unknown func¬ 
tion is capable of being represented adequately by a polynomial of degree p 
considerably less than the number N of observations, to fit a polynomial of 
degree p by least squares, and from it to determine the maximizing value by 
differentiation. In practice, however, there are obstacles to carrying out such 
a procedure with confidence. The form of the function is usually not known; 
it is far from clear what value should be given p even if the function is to be 
regarded as a polynomial; the use of a polynomial which does not give a suffi¬ 
ciently good fit, with observations taken at a considerable distance from the 
maximizing value, perhaps separated from it by other maxima and minima^ 
appears to be a highly dubious proceeding; and if p is taken large, the labor of 
calculation becomes excessive. For all these reasons it is desirable to assign 
the values of x which are to be the basis of the experimental work close enough to 



22 


hakold hotelling 


the maximizing value £ so that a polynomial of very low degree will fit, ade¬ 
quately in the neighborhood. 

We shall restrict ourselves to functions having continuous derivatives of all 
relevant orders 2 in a neighborhood of £. Such a function can in a sufficiently 
small neighborhood be approximated by a polynomial of the second degree. 
The necessity of using a polynomial of higher degree can therefore be avoided, 
•when a fairly good knowledge of the function is already in hand , and when the. 
number N of observations that can be made is large enough, by choosing all 
the Values of a: in a sufficiently small neighborhood of £. We shall suppose that 
this is done; that is, a regression equation 

Y — f>o hiX -b b 2 x 2 


is fitted by least squares to a large number of observations after choosing the 
values of x quite close to the true maximizing value £; and the estimate Xu of £ 
is a solution of dYjdx = by -f- 2biX = 0, so that 



We shall examine the errors in x 0 arising both from the inadequacy that may 
exist in the quadratic approximation and from the random errors of observation, 
and shall consider what distribution of x may most appropriately be chosen (o 
reduce the errors of both kinds, and to place them in a suitable balance with 
each other. 

It will be observed that a fairly definite preliminary knowledge of the function 
under investigation is required for such a program Any criterion for the selec¬ 
tion of values of x for experimentation must involve not only the value of £ 
but also the values of the first few derivatives in a neighborhood of £, or some 
similar information. The requirement of preliminary information is essential 
for the efficient design of experiments in general. For instance the efficiency 
of an agricultural field experiment depends on the correctness of the appraisal, 
before the experiment is laid down, of the general nature of the fertility gradients 
likely to exist in the field and of the variances due to error and main effects 
which will be revealed more accurately by the experiment itself. If the pre- 


, j J l 0t i hor ca8es may wen ariae ln practice and deserve separate consideration in connection 
with the particular investigations in which they arise. For example various physical 
properties of alloys, regarded as functions of the proportion of a particular constituent, 
have maxima but may have discontinuous derivatives because of the phenomena of crys¬ 
tallization and solution of one metal in another. The assumptions appropriate to an in- 

' for finding' P “ alle ! ^ that ° f the preaent papor ‘ of the P^pcr organization of experiments 
for finding such metallurgical maxima must be drawn from metallurgy. The case of con- 
Unuous denvat^ea is however of widespread importance If no regularity assumption is 
made about the function, one set of N values of * is as good as another, and no sot is likely 

in th V ° ry ff 01 f b ° Ut the functlon lf ifc one of the violently irregular ones utilizod 
in the theory of functions to emphasize the necessity of studying that subject 



MAXIMUM OF A FUNCTION 


23 


liminary information is incorrect, a properly arranged self-contained experiment 
will nevertheless give results which, are valid, in the sense that the significance 
probabilities calculated from them by accurate methods are correct, but will be 
inefficient, in the sense that another experiment of the same cost, based on better 
preliminary information, would be more likely to detect real effects through the 
smallness of such a calculated probability. The efficient conduct of experi¬ 
mentation thus proceeds in stages of ascending magnitude. A large-scale in¬ 
vestigation should be preceded by a smaller one designed primarily to obtain 
information for use in designing the large one. The small preliminary investiga¬ 
tion may well in turn be preceded by a still smaller pre-preliminary investigation, 
and so on, 3 like an army marching after an advance guard, which follows a more 
advanced smaller detachment, which follows a still smaller and still more ad¬ 
vanced unit, which follows a “point.” At the very beginning of the process of 
chain experimentation will stand work based on little or no clear information 
of the kind required for efficient design. This first phase will be speculative 
and exploratory in character. Neither its cost nor its accuracy can well be 
estimated in advance. It is a favorite, but not exclusive, preoccupation of men 
of genius. Many of its results turn out to be worthless. But it is an essential 
preliminary to well-organized research directed to definite aims defined qualita¬ 
tive^ in advance. 

After the first speculative and unsystematic phase in the knowledge of a 
subject is past, but before the careful, economical organization of an accurate 
investigation, an intermediate type of exploration is needed to supply estimates 
of the parameters required for the design of the full-scale investigation. In the 
present case such a systematic though small-scale experiment might perhaps 
consist in dividing a range within which the desired maximizing value f is known 
to lie into equal parts, making at least two observations at each of the ends of 
these intervals, and fitting a polynomial of at least the fifth degree by least 
squares. This will make possible estimates of the parameters <r, j8i, • , /3 S 

(and hence of £) required for using the efficient designs which we shall obtain. 
At least six different values of x are required for fitting the polynomial of the 
fifth degree. The fitting process is facilitated by taking them in arithmetic 
progression and using orthogonal polynomials. 


s A remarkable example of auch a aeries of investigations is the chain of sample censuses 
of area of jute in Bengal carried out for the Indian Central Jute Committee under the 
direction of Prof, P. C. Mahalanobis annually beginning in 1937. Each year’s work is 
designed primarily to obtain information for planning the next year’s, and a sequence of 
four or five suoh investigations, each considerably larger than the preceding, is planned 
to lead up to an eventual annual sampling of the whole immense jute area in the province. 
A partial account of this is given in [5], a fuller one in confidential but printed reports of 
the Indian Central Jute Committee, Calcutta. 

Certain multiple-sample schemes in manufacturing inspection also provide good 
examples of chain experiments, [6], 



24 


HAROLD HOTELLING 


2. Sampling errors and bias in the quadratic approximation. Let uh measure 
all values of x from the value £ under investigation which makes/(j*) a maximum. 
Then £ = 0, and in the expansion 

(1) fix) = do + ft* + dj* S + d*»* + ■ * * 

we shall have di = 0 and da < 0; we shall assume that dt < 0. An observation 
y« corresponding to a chosen value rs* will have, by assumption, an error A a of 
zero expectation and variance <r z , such that 


(2) y* = f(x a ) Y A* . 

A quadratic estimate 

(3) Y = bo d - bi# 4- 

of /(a:) is obtained by means of normal equations which may be written 

aobo Y ciibi -}- aibt — Sy 

(4) Oibo Y dibi Y &zbi — Szy 

chba 4- a&bi + aibi — Sz 1 y, 

where S stands for summation over all the observations, so that, for example, 
Sy = 2 y a = yi Y Vt Y • • • + y&, and where 

(6) o* = Sx k . 

In particular, a« = N. A determinate solution is possible only if there are at 
least three distinct values of x; we shall always suppose therefore that this is 
the case. This is equivalent to assuming that the determinant a of the coeffi¬ 
cients in (4) is not zero. A greater number of observations y is necessary to 
obtain an estimate of the variance a, and furthermore we shall suppose this 
number large in our approximations, but since repeated observations may be 
made for each value of z, it is not essential that there be more than three values 
of x in the distribution to be selected. 

If we put 

s bk — bic — d* 1 y k = Sx k A, 

k = u* 1 -' 2 ’ subsfcitute ( 2 ) an d the result in (4), and utilize (5) and 
(6), we obtain 

doibo 4* aidbi H- On&bi = 7a + dada 4~ 0*0* Y • • ■ 

^ aiSbo 4- OiSbi. Y asSb t = 71 4- ai0 3 4- ajdi 4- •.. 

atSbt Y <h$bi 4- a 4 fi6 2 = 72 4- o 6 da 4- ctadi 4> ... 

follows ^at the errors 5b* are homogeneous linear func- 

and iT ni th r efore be ° mau * the 

8 are Small ‘ 0f these quantities, the 7 *’a will be stochastically of the 



MAXIMUM OF A FUNCTION 


25 


order N in for large samples with any fixed set of values of x. When the equa¬ 
tions are solved, their coefficients will be of the order of N~ l , so that the product 
is of order N~ 112 , and becomes negligible if N is large enough. The coefficients 
akOft3s,p4, > • • can be kept small if the values of x are chosen tolie within a sufficiently 
restricted range. Of course the coefficients a k in the left members of (7) will 
also be small in this case, but not small enough to offset fully the smallness of 
those on the right. To see this, we observe that if all the values of x be multi¬ 
plied by any quantity ’g, a k is multiplied by g k , while 




' Oo 

i 

Oi 

02 

(8) 

a = 

Oi 

02 

da 



02 

Ob 

Oi 


is multiplied by g 6 . The cofactors of the last column are proportional respec¬ 
tively to g A , g 3 and g 2 . Hence, in the expression for obi , the coefficient of (3 a is 
of order g, that of 0 4 is of order g 2 , and so on, the coefficients of the 0’s of higher 
orders vanishing more and more rapidly with g as we go on in the sequence. 
The like is true of Sb 4 and Sb 0 , which vanish even more rapidly with g. Thus 
we may, by restricting sufficiently the range of x on the basis of the assumed 
preliminary knowledge of the function, and taking a sufficiently large sample 
of observations, bring it about that the probability will be arbitrarily close to 
unity that the Sb/fs are less than any assigned limits. 

Let us, in particular, restrict the range sufficiently and take a large enough 
sample to make it reasonable to regard $h as negligible in comparison with 0 2 . 
The error in the estimate 


(9) 


CCq 


bx 
2 b 2 


of the maximizing value £ will, since we are taking £ = 0, be x 0 itself, and may 
be written 


5xo = — 


Sb i 

2(02 + 5bf) 


1 Sbx 

2 0* 


0 


8bi 

02 



where the terms other than 1 in the last parentheses are negligible. The problem 
of minimizing the error dxo is then virtually equivalent to minimizing the error 
Sbi. In section 5 it will be shown that it is not until we reach terms of the 
order of g 6 that the errors 5b 2 need be taken into account. We shall first discuss 
the errors in x 0 of lower orders in g, and thus confine the discussion to Sbi . For 
the present we shall take as the quantity to be made as small as possible the 
expectation of the square of this last error, E(6bi) 2 , This is not the same as the 
variance of bi, since Ebbi is not in general zero. We have, in fact, by trans¬ 
posing a familiar formula for the variance, 

(10) E{Sbx? = (FSbi) 2 + olx, 



26 


HAEOLD HOTELLING 


thus dividing our minimand into two parts, due respectively to the bias arising 
from the neglect of terms of third and higher orders, and to the usual sampling 
errors. 

By the usual least-square theory, the sampling variance of bi is 

( 11 ) ct, = /nr 2 , 

where n is the cofactor of the central element in a, divided by a, that is, 

( 12 ) n = (a^di — a%)/a. 

Since g is of the order of g~ 2 , we may reduce the sampling variance as much as 
we please by taking the values of x sufficiently far removed from £. If f(x) is 
definitely known to be only of the second degree, a wide dispersion of the desir¬ 
able values of x is thus indicated, since in this case ESbi = 0, as appears by tak¬ 
ing the expectation of each term in (7). But if, as will usually be the case, 
f(x) has terms of higher orders than the second, an excessively wide dispersion 
may increase the bias ESbi to such an extent as to render the quadratic approxi¬ 
mation inapplicable. 

In taking the expectation of each term of (7) and then solving for ESbi we 
obtain, since Ey k = 0 according to the definition of yk , and beoause Eh. =» 0, 
a result of the form 


(13) ESbi — Baft -|- BiPi -f- Ba/3s 

We shall call B 3 , Bi, and B s respectively the cubic, quartic and quintic com- 
ponents of the bias, or simply biases. If we denote by X, n, v, the ratios to o 
of the cofactors of the second column of a, so that 

(14) Xai + pat vcii = 1, 
we shall have for the components of bias, 


( 16 ) 


B s = Xoj ndt -)- va « 
= X04 + udi -f- vat 
B e = Xob + n'at + va? , 


and so forth Since X, /t, and ? are of respective orders -1, -2 and -3 in a 
multiplier, g of all the values of *, B s is of order 2, B 4 is of order 3, and the higher 

tr„f ,o i“ e b r« l trs* ££& 

9 "tt ^ Cr ®ff e the sam P lm g variance, which is of the order of g~\ g 

having'! fixed ^tribution 

in absolute value- then if thm-o g .. ma ^ e cubic bias a minimum 



MAXIMUM OF A FUNCTION 


27 


quartic biasesj and among these a class minimizing the absolute value of the 
total of cubic, quartic and quintio biases, with the modified meaning of the 
quintic bias taking account of 8bi . 


3, The cubic and quartic biases. We find, somewhat unexpectedly, 'that 
there exists a class of distributions of x for which the cubic bias is actually zero. 
To exemplify this we need give the variable no more than three different values, 
which we may call x, y and z, and we may assign to them the arbitrary fre¬ 
quencies k, m, n of experiments (k + m + n = N). If we put 


(16) 


P = 


1 1 1 


2 2 2 
X y 4 z l 


(x - y)(y - z)(z - x), 


and consider a matrix of three rows and N columns, of which k columns are 
identical with the first column of P, m with the second, and n with the third, 
it is evident that the sum of the squares of the three-rowed determinants in 
this matrix is kmnP 5 . But this sum of squares is also equal to the determinant 
formed from the sums of products of the three rows, and tbs is a (formula (8)). 
Thus o = kmnP 2 ^ 0, since x, y, z are all different. Together with the fore¬ 
going 3 X If matrix consider another, 


(17) 



having k columns identical with that first written, m identical with the second 
written, and n identical with the third. The only non-vanishing three-rowed 
determinants in this matrix are formed of these three different, columns, and 
equal ( xy 4- yz + zx)P; there are kmn of them. The sum of products of cor¬ 
responding three-rowed determinants m the two matrices is therefore 
kmnP s (xy + yz 4- zx). But this sum is also equal to the determinant, formed 
from the sums of products of corresponding rows, 


do 

oa 




Oi 

da 

d4 

) 

- 

eta 

a< 

da 




which, by (15), equals — aBs . It follows that 

(18) -B s = xy + yz + zx. 
There are many real solutions of the equation 

(19) xy 4- yz 4- zx = 0, 






28 


HAROLD HOTELLING 


with the three values all different, for example - 2, 3, 6. If we assign such values 
to our variable, and an arbitrary number of experimental determinations to each 
of these values, the cubic bias £* will be zero. 

It will be noticed that such a solution cannot have zero for one of the values. 
If, for example, s = 0 in (19), then x or y must also vanish, in violation of the 
condition that there must be at least three distinct values. Moreover a solu¬ 
tion cannot be symmetrical about zero; if x -f y = 0 it follows from (19) that 
x = y ~ 0 A solution may or may not be symmetrical about a value other 
than zero. The values 3 — 2 V3, (3 — \/3)/2, \/3 satisfy the equation and 
are in arithmetic progression, while the solution —2, 3, 0 is asymmetrical. 

If we modify (17) by replacing the cubes of the variables by their fourth 
powers, and apply the same procedure to the modified matrix, we find that 

(20) Bi - — (z + y)(y + z)(z + a;). 

Thus there exist sets of three distinct real values making the quartic bias vanish, 
for example any set for which x + y — 0; but no such set can at the same time 
nullify the cubic bias (18). Since it is ordinarily more important for the cubic 
than for the quartic bias to vanish, distributions nullifying (20) are not in 
general to be recommended. But in exceptional cases it may be known that 
fis is zero, or very small in comparison with j8<, and then the vanishing of Bi 
is a more valuable property than that of . It will be shown that no distribu¬ 
tion of three or more values exists such that both the cubic and quartic com¬ 
ponents of bias are zero. 

Let us denote by D p the p-rowed determinant having a, + /_ 2 as the element 
in its ith row and jth column. Thus D 3 is the same determinant which we have 
in (8) called a, and 


do 

ai 

at 

o 3 

Oi 

02 

Oj 

04 

at 

Os 

04 

Os 

at 

04 

Oj 

06 


For every distribution, every D„ > 0; and a necessary and sufficient condition 
that a distribution have p or more distinct values is that D P be greater than 
zero, [7, p. 362]. If D p is positive, so is each of its principal minors. In 
particular, since we are requiring at least three values in a distribution, £) 3 — 
a > 0, and therefore 

( 22 ) asa 4 - al > 0, 

and 

< 23 ) aoOi - al > 0. 



MAXIMUM OF A FUNCTION 


29 


We shall now consider distributions for which the cubic bias B 3 is zero, and 
consequently, by (15), 

(24) Xa 3 -+• neb + vag = 0, 

and expand D t . From the definition of X, /*, v, we have 

, (25) Xoj + PO 3 -f- ^04 = 0. 

Multiply the last row of the determinant (21) by v, and add to it X times the 
second row and y times the third. The last row is thus, by (14), (25), (24) 
and (15) transformed into 


10 0 Bt, 


while the determinant has been multiplied by v. Let this new determinant be 
expanded with respect to its last row. The cofactor of the first element 1 is 


a 1 

02 

03 

Oj 

03 

04 

0 3 

fl 4 

OB 


Let the last row of this determinant be multiplied by v, an operation having 
the effect of multiplying the whole determinant by v, and let X times the first 
row and y times the second row then be added to the last. The last row is 
thus, by (14), (25) and (24) reduced to 


1 0 0. 


Hence 


vG — — (o«fl 4 — ol), 


and consequently 
(26) 


*D 4 


v{aBi -f- (?) 

vai?4 —((hdi — fflg). 


Since the first member of this equation is positive or zero, ( 22 ) shows that it is 
impossible that B 4 should equal zero when B 3 = 0 as we have assumed. That is, 

Either the cubic or the quartic bias of every distribution having three or more distinct 
values must be different from zero. 

If v were zero, (26) would contradict (22). Hence v 0. With every dis¬ 
tribution of x there is associated another obtained from it by changing the sign 
of each value of x. Such a pair of distributions we shall call opposite , When 
we pass from a distribution to its opposite, the power-sums a* remain un¬ 
changed when k is even and change only in sign when k .is odd. Since a is 
always positive, and since 


( 27 ) 


v = (01O2 — 0003) / a, 



30 


ha tuiiitt 


v has opposite signs and th<* HMttf* »!•!*<♦lore % f**r t ^* ri*i-*.*i' ■Tj.i 
The conclusions to be reached shortly wih Is v*dvl {> ■* . 

and its opposite, and in re-aehit * ih»*m we itny n^wn<- * - u It wffi f?.«u 
follow from (22) and (2(1) that fU > ft. 

4, Distributions nullifying cubic bias with minimum qua rite buns. UV* * nu 
now prove the following theorem: 

Among distributions for which the ctihtr tm« tGantt** <tvi the «rr <.r <f 

bi has a fixed value, those for which the qmrtvr bw< res «t m ittimtei foto exactly 
three distinct values of the variable. Then jfilmt* wfi*fy ihr up, 

(28) xy + j jz -f ax *• 0. 

Since the standard error o- of a single observation re not by tin 4r * 

tnbution chosen for x, fixation of tin* standard errer of h t »«* irepnvitb-ii* bv (H j 
to fixation of the value of the expression a given Kv d2 j Wr \sjjq* <,’«'> therefore 
that a has some, fixed positive value and that /?» U t , /f» and /#* 4<» 

not involve the distribution of x excepting through the on«<*,, , 

fl«, we may treat these power-sums ns the ind« j«iide»» vimsbb-t in Irving ut 
make 5* a minimum. Their region of variation i« limped f«v ?}.<« meqHabiiie* 
referred to in the preceding section, 

Di ~ 0q > 0, Di > 0, /)j « a > 0, fh > fl. 

The inequalities D p > 0 for p > 4 involve power-sumo* ref todere* higher than 
the sixth and are irrelevant to our. purpose. 

The definition (8) of a shows that it is independent of a% and *u ; mtimpienllv 
X, n, and v arc also. According to (15), H% involves «t« hut not «* *, while of all 
the expressions we have considered, only IU and /), are furred i*w of a 4 , There¬ 
fore when no j ®i, * • • , Os are given any definite value#, <a 4 may l*e chosen to 
make B t a minimum without any regard to the fixed value* of » and A* . Now 
(15) shows that Ba is a linear function of a« with a coefficient which, at the end 
of the last section, we have proved not to be zero ami gammed positive. Thus 
Bi , which is also positive, is an increasing function of a 4 . h« minimum will 
correspond to the least value of at consistent with the condition l>, ;> 0. But 
(21) shows that D t is also a positive linear function of a$ with a positive enrffi- 
cient, a. The minimum of a* , and therefore that of B 4 , require therefore* 
that Di = 0. But Di — 0 is exactly the condition that there should l«- no mom 
than three distinct values in the distribution. Since there* must be at, least 
three distinct values, and since if there are only three they mtMsatMy (1U), 
the theorem is proved. 

The xmnimum value of Bt with respect to variations of a* when B& « 0 may 

be iourid by putting D< = 0 in (20). Designating this minimum by b and 
using (27) we have 

(29) 5 = mm - 4 

flio» —• aoaj' 



MAXIMUM OF A FUNCTION 


31 


" |, r, ‘ 1 '"' numerator i.s intrinsically positive, and the denominator is positive 
for the elans of distributions we are now considering, though we might equally 
well consider the opposite distributions, for which it is negative. We have also 
from 

(* + y)i.ll + z) (z + x) = -b. 

Substituting for each of these binomials its value as given by (28), we may write 
this in the simpler form 

(3 11 xyz — b > (). 

It was show’ll at the beginning of section 3 that when there are only three 
values in the distribution, with frequencies k for x, m for y, and n for z, 

(32) a — kmnP 3 — kmnix — yfiy — z) 2 (z — xf. 

The first two rows of (17) form a matrix such that the sum of the squares of 
its twn-rowed determinants is 

(33) mn(y* - z 2 f + nlc(z 2 - x 2 f + km(z 2 - if. 

Since this is equal to the determinant of the sums of products of the rows, 
namely 

Oo a* I 

a, a t r 

it follows from (12), (32) and (33) that 

( 3 1) a m ^ + *)* 4. C* + «)*__ , __ (* + Vf _ 

V M “ k(x - yP(x - zY ^ m(x ~ y)'(y - zf T n(z - zfif-z) 2 ' 

It b desired to minimize this expression, which ia the factor of the variance 
that is independent of the accuracy of the individual observations, while hold¬ 
ing b «■ xyz fixed; or to minimize b while holding g fixed. In either case the 
values of r, y and z are to lie chosen to satisfy (28). The relations established 
by the solution of either of these virtually equivalent problems will fix x, y, and z 
except for a factor of proportionality, which must then be adjusted to provide 
a balance as satisfactory aa possible between random errors and bias. 

fi. The quin tic oiaa. Effect of Sbt . With any distribution determined in 
this way will be associated its opposite distribution, which will have the same 
minimizing properties so far as the variance and the cubic and quartio com¬ 
ponents of bias are concerned. The appropriate choice between these two op¬ 
posite distributions will in general involve the quintic component of the bias. 
At this point we must, for the first time, take account of the errors in the de¬ 
nominator 2>s of xt>. 

Since bi converges stochastically to Ebi , and bt to Ebt, the error x 0 ~ —ih/bt 
converges stochastically (for large samples) to — $Ebi/Ebt. By keeping our 


V 


* 



32 


HA HOLD HOTEM.IXG 


values of x close enough to £ we may insure that Eh t differs s* as we 
from 0i , and hence that the series 

Ebi Ebi Ebt L _ ENtx (E&Wf „ ,. /; 

Eh Pi + E&b, Pi \ Pi Pi 

converges rapidly. Let us rearrange this scries after inserting for Eb t ami K#h 
their values, so as to obtain a scries in ascending powers of a ennmioji multi¬ 
plier g which may be applied to the values of jt. We recall that in Urn rxpre^io,, 
(13) for Eh , B t ifl of the second order in g, B, is of the third order, Ik is of the 
fourth order, and so forth. In the same way, we find that 

jSfihi = Cifii 4- Ch/h 4* - ■ - , 

where 

rii a® 
at «.( 

i 

at 04 ! 

is of the first order, C* is of the second order, and so forth. Thus in 
Pi =jT- L = B*fh 4- (Btfii — BiCi0i/0i) 

M/02 

4 - (Btpi — BiCtPtPt/pt — BtCiPtPi/pi 4 * Bj(‘lpt Pj) 4 - • ■ • , 

the first term is of the second order, those in the first pamithrwm are of the 
third order, those in the second parentheses are of fourth order, and the re¬ 
maining terms are of higher orders. 

We have seen that we can choose distributions for which B a 0. In this 
way we get rid of the second-order term and reduce the third-order terms to 
Bi@i . We shall in the next two sections show how, under various conditions, 
to select from among the distributions for which B t ^ 0 em opposite pair for 
each of which | J3< | is a minimum. In choosing between these two oppos'd e 
distributions, the criterion we shall adopt is that the terms of third order and 
those of fourth order shall have opposite signs; for while the fourth-order terms 
may be made much smaller than those of third order in absolute value, still it 
is desirable that they should offset them, in order to reduce the error. Thu 
terms of third and of fourth orders reduce respectively for /f s * fl to BpU and 
to Btfir, — 4 BiCiPtPi/pi . Our criterion is that these are to have opposite signs, 
and consequently that 

- B*CM) < 0 . 

We shall however modify this criterion whenever a is not negligibly small. 
A more precise criterion will be obtained by expanding x<? in a series of powers 
of Sb s , taking the expectation term by term, and reducing the moments thus 
obtained of orders higher than the second to those of first and second orders by 




MAXIMUM OF A FUNCTION 


33 


means of the theory of the bivariate normal distribution of b L and b 2 . It is 
then necessary to make some assumption regarding the order of magnitude of 
x,, y and z relatively to N in order to assemble terms of like magnitude in a 
criterion resembling that above but involving cr. The appropriate balance in¬ 
dicated by the results of the next two sections calls for x, y and z to be of the 
order of N~ 1/8 . This leads to the following criterion: 

p s (B 4 B t i - BlC&pl - C 3 ( W s ) < 0. 

We have seen that B\ — b — xyz. To evaluate C a and Be,, which latter 
may in accordance with (15) be written 



Oo 

02 

flg 

„ 1 




Bs = — - 

Cl 

a 3 

d<J 

a 

a 2 

04 

a? 


we proceed as in section 3, replacing the second row of (17) by the first 
powers to obtain C 3 , and replacing the third row of (17) by the fifth powers 
of x, y and z to obtain Bs . In this way we find 


1 

1 

l 


1 

1 

1 

X 

v 

z 

, Bs = —p 

x 2 

V 

z 

3 

i 

3 


6 

s 


X 

v 

Z 


X 

V 

z 


Letting 2x, 2 xyz, etc. stand for the symmetric functions of x, y and z of which 
one term is written in each case after 2, wc may reduce these expressions to 

Ca — 2x, 

Bs = —2 xy — Sx 2 y 2 — 2'Zxyz. 

With the help of (28) and (31) we find 

~Lxyz = xyzhx — bSx, 

2 = (2xy) 2 — 22 afyz = —2bSx, 

2 x 3 y = 'Zxy'Lx - 2a ?yz = —b2x. 

Therefore B& = b2x. Substituting these values for Bs, C 3 and B t in the 
lasL inequality gives the rule: 

Choose that one of a pair of opposite distributions for which 

(36) (x + y + aOftlflfc(ftft - M - /Si < 0. 

It will bo remembered that da is negative for a maximum of f{x), positive for a 
minimum. The other /3's can only be estimated from preliminary experimen¬ 
tation, or possibly in particular cases from general knowledge or theory. 
Quite different algebraic methods are appropriate to minimizing y with a fixed 



34 


HAROLD HOTEI.UNt? 


b according to the limitations to be placed on the fn*qttf*n«*ii> h. to, n : th>* meth¬ 
ods leading very simply to a .solution in mu* cast* involve tmubb -*mu* eompiirs- 
tions in another. We shall deal with two of the leading 


6. The case of equal frequencies. Home oxperimeti tat nit nation-* rsdl for equal 
frequencies for all values of the variable. If I; ~ m ««, then n ti •* .V -e. ?,n. 
Let a'j ~ a,j/n. Then — 3 and <A « Inasmuch as 

(36) Zxy = 0 and xyz « h, 

we may express a%, a.( and a[ as functions of a[ and b as follows: 

fl ; = s* 1 = (2x) 5 - 22xy -> a', 1 . 

a t = 2x 3 = (Si) 3 - 32x 2 y - t lxyz\ 

and since 2x 2 j/ = 2x2xj/ — 3xj/z we have from (36), 

Oj — a( s -f 36. 

We have also 

a{ = 2x 4 = (2x) 4 - 42 x l y - 02xV - 122 x*ys, 

and since 


2x a j/ = 2xy2x 3 ~ 2xV, 2x^z = xyzSx » o|b, 


it follows that 
Therefore 


2*V - (Sxj/)* ~ 22xV =. ~2a\h, 



/ 

*> a( 4 -f- 4oj6. 


3 

a[ 

'* l 

ai 

a = n 3 

/ 

a* 

aj 3 + 36 

«» 

/s 

ttl 

a? + 36 

Oi 4 + 4a( b j 


Upon subtracting a[ times the second column from the third, and a[ limes the 
first from the second, this becomes 


a = n s 6 

Also, 

aodi — a\ = r ! 
Hence, by (12), 

( 37 ) ^ 


3 

-2a( 

0 

/ 

a x 

0 

3 

a x 

36 

a[ 

'{3(a( 4 

4- 4a! 

[6) 

_ Qo dj 

- a5 



a 



V6(4a? + 276). 


nb 4a( 3 + 276' 



MAXIMUM OF A FUNCTION 


35 


Differentiating with respect to a[ to find a minimum, we obtain 

0 = (4a? + 27b)(4a? + 6b) - 12a ?(a? + 6 a[b) = 4a? + 60a? + 162b 2 . 

The minimum of y, for b fixed, and satisfying the condition 4a? + 27 b < 0, 
which is equivalent to a > 0 since we assume b > 0, is attained when a? = bq 
where q is the numerically greater root of the equation 4/f -f QQq -f 162 = 0; 
that is, 

q = -(IS + V63/2) = -11.468 626 97. 

The elementary symmetric functions of the values x, y, z composing the dis¬ 
tribution are 

= a'i = ( bq) 1/a , 2a nj = 0, xyz — b. 

Hence x, y and z must be the roots of the equation in u, 

(38) u 3 - ( bq) m u 2 - b = 0. 

If we put u = (bq) Ua v, 

v 3 — v 2 4- q~ l — 0. 


Calculation gives approximately 

« (f l = —.087 194 396, and for the roots of the equation in v, 

(39) , 

.2628, -.3729, -.8899, 

numbers which are therefore proportional to the values of the variable that 
should be chosen when the frequencies must be equal. If any values x, y, z 
proportional to these are used, the value (37) of y is 


(40) 


_6 q + 6 i/s ,—2/3 

N 4<z + 27 q 


and is the minimum consistent with any fixed value b of xyz. 

Choice of the factor of proportionality will involve a compromise between 
the criteria of minimum sampling variance and minimum bias. If we ignore 
components of bias of orders higher than the fourth and recall (10) and (11) it 
will appear that the appropriate combined criterion is that 

(41) + yr 1 


shall be a minimum. Putting for y its value y' from (40) and differentiating 
with respect to b gives 


or 


2j3<b + 


4<fV /3 0 . + 6 ,—6/a 
N + 27 


= 0 , 


b = V = 


2c 2 q 6 i/a\ 3/8 

Nfil + 27 9 / ’ 



36 


HAROI.P IlftTtJ mu 


The product of the three roots (39) i- —q Xumlwr- j-rojKif joii *! '•<*!.> m and 
having the product b' an ill la* obtained by multiphiug fiasn b*, -**'■/ tin 11 

is, by 



Multiplying (39) by 2.3318 gives numl«‘rs 

(42) .0128, -.8095, -2.0751, 

which must still be multiplied by ± jofVt.YrfSjl’" to give the -el minmiuiog 
E5b\ The ambiguous sign is to be fixed according («» Me nib n> tie* < nd of 
the last section. Thus we arrive filially at the com-hiMott; 

If the numbers of observations arc required hi hr the Mime fur off (hi uiha v a/ th> 
variable used, these, values should for greatest tfRru nmj dmah from (h* tel mat'd 
maximizing value by the produels of the three numbers, 1 12 1 to/ 



choosing the ambiguous sign so oh to satisfy (35). 

The product b' of the three values is to lie mtlwtUuU*! for h m t lOi and <35 \ 
and the value of fi thus obtained from (40) fs also to be M»lo<»tmcri m <35n 
These substitutions yield 

(*+?/ + z)MMi - 4A0<) < 0 

as the criterion for choosing the sign in (43). 

The expectation of the square of the error in the estimate of (he value of e 
is, according to (9) and (10), given approximately by the ratio of til) to Id?, 
and it is this that will be a minimum when the foregoing rule is followed. The 
minimum of (41) is obtained by replacing h by b' in (40) and (41), and sub¬ 
stituting (40) for g in (41). Thus gives 

m,) ’" 

that is, 

( 44 ) TOO* - 4.889 N~ m p\'V n . 

7. Adjustable frequencies. If the total number N of observations to |* made 
can be distributed freely among the values of the variable, the efficiency of the 
experiment can be increased by a proper selection of the individual frequencies 



MAXIMUM OF A FUNCTION 


37 


k, m, n along with the corresponding values x, y, z. We shall choose these 
six unknowns, subject to the three conditions' 1 

(45) k -j- m -f- n = N, 

(40) xy + xz yz = 0, 

(47) xyz = — b, 

to minimize y, The last condition fixes the quartic bias, the preceding one ex¬ 
presses the vanishing of the cubic bias. It is of course understood that k, m, n 
are all positive, and we shall, as before, suppose initially that b is positive. No 
two of x, y , z can be equal, and it follows that none of them, or of the sums 
of two of them, can be zero while satisfying the second condition. We shall 
lose no generality in assuming that 

(48) x > y > 0 > z. 

Furthermore, it is easy to see that x + y, x -f- z, and y + z are all positive 
Therefore the quantities 


(49) 


_ y + s 

(x — y)(x - z)’ 


x z 

(x - y)(y - z) ’ 


are all positive. 
(50) 


From (34) we have 


2 2 A 

r , s'‘ t 

y — t + - H—• 
k m n 


_ x + y _ 

(x - z){y - z) 


The values of k, m, n making tins a minimum while themselves subject to the 
limitation that their sum is N must if they were continuous positive variables 
be proportional to r, s and f. Of course the frequencies are integers, but we are 
supposing N large, so that the values found by differentiation will be close 
approximations, and we shall disregard this complication. Put therefore 

(51) r = kp, s = mp, t = np, 

where p is a multiplier which evidently is not zero. If we use these equations 
to eliminate r, s, t from y wo obtain, with the help of (45), y = Np 1 . But if we 
use them to eliminate k, m, n from (50) we have instead, 

y - (r -f & + Op- 

Now from (49), 

(52) r + i = s, 


* The condition (47) is here used instead of (31), from which it differs by the introduction 
of the negative sign, because it simplifies the argument of this section sightly to have the 
quantities (49) positive. There is no essential difference, since we arc socking a pair of 
opposite distributions. 



38 


K.MiOM* mnr.l.t i\n 


so that n = 2 sp. Therefore .Yp =- 2r>, and finally S*-'..V, Th'-r'for*- » in 
a minimum when (lie positive (pinnfify * is ;t rniutmum. In fie- t-xprewtrtn 
(49) for s we Kiih-stitute from Ml!) ami *47 ) 


(53) 


i+J a ~~xz/y ** h/;/‘, 

(x -■ v)(y - z) » (x -f «)y - xx - y s « 25'V - 


so that 


(54) 


y(2h 


/r 


Since ?/, s and h are positive, this alums that y* < 2b, The value of y tm flu* 
interval from 0 to 26 making s a minimum is found by differentiation u< )«- 
2~ )/3 h us . Substituting this in (53) anti (47) gives 


r + 2 3 / V /, i x* - -2‘V, 


whence 

(55) x = (b/2) l/, (l + VS), y « (5/2)‘ f \ s » C6/2i* E (l - V '3). 


Prom (45), (51) and (62) it is seen that k + n « m « jY/2. Thus half the total 
observations are to be concentrated on the middle value. From (51) and f-10) 
we have also 


wherefore 



Nlf-*' Nx* -y 

ca —- S******--^ y» &ss , _„ „ _ .Jr„ 

2 x 5 — z 1 ' 2 x* - «» 


With (55) this shows that 


k — JV(2 — \/3)/8 * m « Af/2, n. * W(2 + \/3)8 

= .03349 AT, a .46051 A’. 


We have seen that p — 4s 2 /W. Substituting in (54) the value found for y 
gives s ~ 2 b~ /3. Therefore the minimum of m for a fixed value of b is * 

( 67 ) m - (10/9W)(2/6) 9/ * 

Inserting this in the expression (41) for the total expectation of the squared 
error and then differentiating with respect to b gives 

(68) 5 = 2 7/ V ,/8 jV^ /8 /J7'V rt . 



MAXIMUM OF A FUNCTION 


39 


When this value is given to b, (41) becomes 
(59) 3.82O7N~ Vi 0\ l2 a m . 

The greater efficiency of experiments with the frequencies (56) and the corre¬ 
spondingly adjusted values x, y, z, in comparison with the case in which the 
frequencies must be equal, corresponds to the smaller coefficient in (59) than m 
(44). To obtain as great accuracy with equal frequencies as with adjusted 
ones it is necessary to have more observations, in a ratio obtained by equating 

(59) with (44) after inserting different symbols for N in the two cases. In this 
way it is found that the number of observations required with efficient distribu¬ 
tion of the frequencies is almost exactly 72 per cent of the number required 
when the frequencies are equal, if the values x , y, z are in each case given their 
most efficient values. 

Substituting (58) in (55) gives the numbers 

(60) 2.1520, .7877, -.2110, 

multiplied by (43), with a change of signs if necessary to satisfy (35), as the 
values x, y, z of the variable to be used. The more concentrated character of 
this distribution with adjustable frequencies is emphasized by the small propor¬ 
tion, less than per cent, of the frequencies (56) that pertains to the value most 
remote from the tentative maximizing value. 

When (58) is substituted in (57) and, with the result, in (35), this inequality 
reduces to exactly the same form as that obtained in the preceding section for 
fixing the sign of (43). 

8. Introduction to the two-variable problem. Functions of two or more 
variables are of greater practical importance than functions of one variable. 
The recent work on factorial experiments [8] makes it clear that in the experi¬ 
mental determination of maxima of functions of several variables, considerable 
improvements are possible over the practice of trying the effect of variations 
in only one variable at a time while holding the others constant. It seems likely 
that the methods worked out in the previous sections for experimenting with 
one variable are capable of generalization. However certain difficulties enter 
which have not yet been surmounted. The object of the present section is to 
indicate something of the nature of the problem of extending the foregoing 
results to two variables, x and y. 

Let us suppose that a quadratic regression oquation, 

Z = boo + l>io* T box V + KW + 2 buxy + boaiy 5 ), 

will be fitted by least squares to observations of z = f(x, y) based on N combina¬ 
tions of x and y, each of which represents a point in a plane. Since there are 
six coefficients to be determined, there must be at least six distinct points 



40 


HAHOLD m«H.UN« 


(ij t y l ) t ... , (x & , ye). The rooflirienf* in tin* normal < qinfion' wav iw « mtfu 
a,i. = Sz’y\ so that a m - A'. Tin* detmiimunt 


Cltxi 

a in 

ifrt 

dm 

On 


, OlQ 

(ho 

rtn 

fl*a 

(hi 

U}2 

! Ooi 

a — 1 

1 

0 u 

fiez 

On 

fflti 

(hi 

(ho 

(tat 

<ln 

(Ita 


%S 

| an 

Uit 

0 1J 

an 

ihst 

«w 

1 (hi 

on 

(ha 

0*3 

(&1S 

«<• 


must not vanish. Let the function under investigation b« 

/(*, y) « S2i9 |4 »V/C7' + *)!, 

and suppose that fiu, = 0 — /3o t , ko that thi* origin is tin* jaunt *«»ught at whirh 
the first derivatives vanish. We shall assume t hat 


P - PmPm - Pu > 0, Pn < 0. 

implying a definite maximum. The estimate* x 0 , y a of the maxitnUing tor 
minimizing) values obtained by differentiating Z are 

Xq = (bjlboi 1 bultlo)/b, l/o ear {hijh|0 — 1 ,< h, 

where 


b == hjehoj — fan , 

For large samples and values of x and y taken not too far from the origin, b will 
approximate to p, and Xo and y 0 respectively to 

(Pub oi — M\o)/P, (Pu&io — 


Some means is needed of combining into one the two desiderata of minimising 
the errors x 0 and y 0 . A combined measure of these deviations ia 

fiioXa + 2/3nXoyo + P&yl . 

This expression is constant except for terms of higher order when x» ami y B , 
whde remaining small, vary in such a way that f(x, y) maintain* a eormtant 
value. Substituting in it the approximate values of r„ and y 6 give* ff “* time* 

Pm bio — 2/3 1 ihio6m -J- Psoboi . 

The expectation of this measure of error may be separated into two parts by 
means of the formulae for the variances and covariance, 

<rl 10 = Eb ]o - (Ebiof, c r* lo4pi = Eb,obti - iM K )(Eb*d t ole. 



MAXIMUM OF A FUNCTION 


41 


One of these parts is a generalized sampling variance, 

do2ffb 1(1 — 2/3„ + P 20 o*b 0i , 

and tends to zero with order A r_1 as N increases provided the values (x k , yk ) are 
lixcd. The other part, 

(01) AttC^io) 2 — 2j3u(Ebia) (Ebm) 4“ PzoiEhoi) 2 , 

is a bias which does not tend to zero as N increases, but which may be kept 
arbitrarily small, at the expense of the sampling variance, by restricting the 
values (xi, , yO to be sufficiently small. This expression is a negative definite 
quadratic form in Eb 10 and Eb 0 i, and therefore cannot be zero unless both these 
components of bias vanish separately. 

We may proceed as in paragraph 2 to express Eb ia and Ebm in terms of the 
coefficients of /(.t, y) of orders higher than the second, among which those of 
third order will be of leading importance. In this way it may be shown that, 
if wo neglect terms m f(x, y ) of orders'higher than the third, Eb^ and Ebm are 
given by the ratios to a constant multiple of a of determinants obtained from 
a by replacing respectively the second and the third columns by the column 

PsaChO 4" 3A>lU21 + 3ft2<ll2 -f- PB3Ch3 

030*40 + 3ftl a 3 l + 3012022 + 003*13 
030 031 4~ 3/321022 + 3012*13 + 003*04 
0SO*M 4~ 3021*41 4~ 3012*32 003*23 

030*41 + 3/321032 -f" 3/312023 4" 003*14 
0ao*32 + 3/321023 4~ 3/3i2 *h 4" 0O3*O6 • 

It is desirable to select a distribution of points (x h , yC) such that those compo¬ 
nents of bias will vanish, no matter what may be the values of 0 3O > fti, 0u and 
fra • For this it is necessary and sufficient that all the determinants vanish 
that are obtained from these two by replacing the column written above by the 
terms in it that multiply any one of the four frit's. The single-variable analogy 
suggests using a distribution having the smallest possible number of points, 
which in this case is six. Let us now take N = 6. The eight determinants will 
all be multiples of 



1 

Si 

Vl 


zi!/i 

2/i 

p = 

I 

^2 

2/2 

x\ 

XzVi 

v\ 


1 

*8 

2/o 

x\ 

*o2/o 

2/6 


To save space we shall indicate determinants of this character merely by writing 
a .single row without subscripts, thus: 

P = ! 1 x y x 2 xy y* |. 




42 


HANOI,!) 1IOTKI.UNO 


If we define 

A',k — ! i aV y ^ - v -' y - 

A^ = I 1 X xV x 2 xy ?/" , 

and multiply each of these determinants for which j + k S (j, k *' 0, 1,2, 3) 
by P, columns by columns, we shall have, exactly the determinants whose van¬ 
ishing is the condition for nullification of the cubic bias. If we multiply P by 
itself in the same way we have ~ a. Therefore P t* t>. Therefore the re¬ 
quired condition is that the distribution satisfy the eight equal ion* 

A %o ** 0, A*, *» 0, A« «* 0, A»j * 0, 

An — 0, Aii == 0, A« « 0, Ai* 0, 

and the inequality P / 0. 

In seeking distributions nullifying the cubic bias we have twelve unknowns 
xi , ■ • • i x«, Vi, - • • , y» which must satisfy these eight equation*. This sug¬ 
gests that we give arbitrary values to four of them and then solve for the other 
eight by straightforward elimination. Unfortunately, since the eight equations 
are each of the tenth degree, reducing to the ninth degree when coordinate* 
of two of the points are given numerical values, a straightforward elimination 
would seem to lead to an equation of degree 9* *» 43,016,711, Tin* mimWr 
of algebriac operations in performing the elimination, solving the equation for 
one of the unknowns, substituting back, and solving for the others, would be » 
large multiple of this number, and would doubtless be sufficient to occupy a 
large and efficient computing project for many millenniums. At, the end of this 
period it might be found that the roots corresponding to the original arbitrary 
values chosen were all complex or made P *= 0, and were therefore unusable. 
Thus indirect and less elementary methods are called for, and some qualitative 
investigations of Buch distributions, if they exist (which is not certain), are in 
order. 

The Bet of conditions as a whole is invariant under all non-singular homogene¬ 
ous linear transformations of x and y, as is easily proved by making linear 
combinations of the columns of each of the determinants AU , A"* and P, and 
by making linear combinations of these determinants themselves. These 
linear transformations leave the origin invariant. They have four degrees of 
freedom, which is exactly the right number to take care of the excess of un¬ 
knowns over equations. This points to the possible existence of a finite number 
of fundamental solutions, from which all solutions may be obtained by linear 
homogeneous transformations. Geometrical properties of the, configuration will 
be represented by invariants under linear transformations. Thus the condition 
P 9 * 0 means that the six points must not all lie on any eonio section. From 
this it follows at once that no four of them can lie on a straight line, since this 
line, with the line through the other two, would constitute a degenerate conic. 
As a matter of fact, we can go further and prove that no three of the points 



MAXIMUM OF A FUNCTION 


43 


may lie on a straight line. In the proof of this and other properties of the dis- 
(ribution it is convenient to use the arbitrariness provided by a linear trans¬ 
formation to pass the axes (which may be oblique) through any two of the six 
points, and then to adjust the scales of measurement so that the coordinates of 
these points become (1, 0) apd (0, 1), except that one of them might conceivably 
be the origin. If three points arc collmcar, their line can be taken to be the 
a-axis if it passes through the origin, or the line y — 1 if it does not. Even with 
the help provided by such procedures the proofs are rather long, though straight¬ 
forward. Wc shall content ourselves here with stating, without proof, the fol¬ 
lowing properties necessary for sets of six points for which P ^ 0 and all com¬ 
ponents of the cubic bias vanish: 

No three of the points can lie on a straight line 

No two straight lines through the origin can contain four of the points. 

No four of the points can lie on the vertices of a parallelogram. 

The set cannot consist of the origin and the vertices of a regular pentagon with 
center at the origin. 

These conditions have been established by calculations of a rather straight¬ 
forward and laborious sort, too long to be reproduced. 

If z k = Xk + iVk and z k = x k — iy k , the conditions P^O, A, k — 0 = A , k , 
may be written 

l 1 3 z z 2 zz s 2 | 5^ 0, | 1 z'l 1 ’ z z 2 zz z 2 ■) = 0, | 1 z z } z k z 2 zz z 2 I = 0. 

9. Some further unsolved problems. Since it is useful to demarcate the 
frontiers of knowledge by pointing out what lies a little outside them as well 
as what is within, a few of the many questions may be mentioned which this 
paper falls short of answering, Besides the extension to two variables men¬ 
tioned in the last section, and to an arbitrary number of vaiiables, it is desirable 
that the whole theory should be developed from an exact, or small-sample, 
point of view rather than on the basis of the large-sample approximations used 
here. This however appears to be an extremely large enterprise. A simpler, 
but still quite difficult, problem is to modify the criteria obtained in paragraphs 
G and 7 so ae to fit problems of economic experimentation, such as those of 
determination of maximum monopoly profit or minimum cost, in which the cost 
of each observation consists largely of the lost profit, or excess cost over the 
minimum, occasioned by the deviation from the value sought. In such a case 
the limitation of cost replaces the limitation of the total number of observations. 

Another important problem is to take account of the inaccuracy of the pre¬ 
liminary information on which the design of the experiment is based, and to 
utilize the relations thus involved to design efficient sequences of experiments. 

Determination of limits of error in terms of the maxima aver an interval of the 
derivatives of f(x ) should be a fairly straightforward problem in analysis and 
have practical importance, With this are associated various problems dealing 
with maxima of functions having discontinuities in the first or higher derivatives 
at or near the maximum. 



44 


HAROLD jrOTKIXINO 


An important extension would deal with the raw* in which flu* maximum ri 
estimated from a least-squares polynomial of degree three or more. Tim might 
be a connected with the difficult wider problem of deriding on the degree of a 
polynomial to be fitted in a particular case. 

10. Summary. In determining the value f of x for which fix) in a maximum 
or minimum, a quadratic polynomial may he fitted to olwrvufuim* made for 
chosen values of x. The errors considered are of two kinds: sampling error- 
resulting from the inaccuracy in each observation, which diminish as the number 
of observations is increased, but increase if the values of x are chosen too close 
to the value sought; and biased errors resulting from the fact that fir) is tint 
truly quadratic, which do not decrease when the uumlwr of observations in¬ 
creases with a fixed set of values of x, but do decrease when the deviation.-' of x 
from the value sought arc reduced. The biased errors may lie separated into 
components corresponding to the third, fourth and higher powers of x -- £ in 
the expansion of /(x), and these components will ordinarily (a* of diminishing 
importance as we go on in the sequence. However it is possible to chorwe values 
of x making the cubic component zero and the quartic component at the same 
time a minimum. Such a set consists of only three values of x. These values 
may be further adjusted to minimize the expectation of the square of the total 
error in £, as far at least as the term of fourth order in the bias, by a proper 
balance between the sampling variance and the quartic bias. The values of x 
satisfying these conditions, measured from the true maximizing or minimizing 
value £, are the products of [cr a /(A r ^)] 1/e by the values u in the table* lwdmv. 
Since the root will usually be extracted by logarithms, the common logarithms 
of the values are given. The first set are the most efficient when the frequencies 
must be equal. The second set is appropriate when the frequencies are made 
proportional to the quantities in the last column; in this case only about 72 per 
cent as many observations are required for any specified accuracy as when the 
frequencies must be equal. The approximate expected squared errors in the 
estimates of {in the two cases are given respectively by formulae (44) and (50). 
All these results are approximations of the kind appropriate to large numbers of 
observations. 


Equal frequencies 

Adjustable frequencies 

u 

logic 'll 

i w 

logiou 

Frequency 

-.6128 

- .21267 

-.2110 

-.67572 ! 

46 651 N 

.8695 

-.06071 

.7877 

-.10364 ! 

.50 000 N 

2.0751 

,31704 

2.1520 

.33284 

.03 349 A r 


The. signs of u should be reversed if 0s0 4 (/9 2 0 # - 4/9,0,) > 0. Here 0* is the 
coefficient of (x - f) in the expansion of /(*), and a is the error variance of 
an individual observation. For designing an efficient experiment it is necessary 



maximum of a function 


45 


to have some knowledge of these quantities. It may be gained from preliminary 
experiments of smaller scale. 

A suitable preliminary experiment, where knowledge of the function is ex¬ 
tremely scanty, might consist of a fixed small number, greater than one, of ob¬ 
servations on f(x) corresponding to each of a set of six or more values of x in 
arithmetic progression covering an interval that includes the value £ sought, 
and selected with a view to getting £ in the center of it as nearly as possible. 
A polynomial of the fifth degree at least should be fitted by least squares, in 
which process all the quantities desired for the design of the later, larger experi¬ 
ment can be estimated, together With their accuracies. Since the values of x 
are taken in arithmetic progression, the fitting can be carried out with extreme 
ease by the method of orthogonal polynomials. 

Numerous subsidiary questions promise to have both practical importance 
and mathematical interest. 


REFERENCES 

[11 Felix Bernstein, Hartmann and, Bauer Handbuch der Erblichhilslehre , Vol. 1, Berlin, 
1928 

[2] S.O Rice, “The distribution of the maximum of a random curve,” A m Jour o/Malh., 

Vol 61 (1939), pp. 409-116, 

[3] G. W. Elmen, "Magnetic alloys of iron, nickel'and,cobalt in communication circuits,” 

Elec, Eng., Vol. 54 (1035), pp. 1292-1290. 

[4] E, E. Schumacher and A G, Sovden, "Some alloys of copper and iron,” Metals and 

Allot/e, Vol. 7 (1938), pp 95-101 

[5] P, C. Maiialanobis, "A sample survey of the acreage under jute in Bengal, with dis¬ 

cussion on planning of experiments,” Proc. 2nd Ind, Stat, Conf„ Calcutta, Statis¬ 
tical Publishing Soc. (1940). 

[6] H. G, Romig, Allowable average m sampling inspection, Columbia Univ, thesis, privately 

printed, 1939 

[7] J. V Uspensky, Introduction to mathematical probability, New York, McGraw-Hill, 

1937. 

18] It. A, Fisher, The design of experiments, London and Edinburgh, Oliver and Boyd, 
1935, Chap. 6. 



ON A STATISTICAL PROBLEM ARISING IN ROUTINE ANALYSES 
AND IN SAMPLING INSPECTIONS OF MASS PRODUCTION 

By J, X lyman 

University of California , BnkrhCalif. 


1. Introduction. ■ - 

2. Statistical hypothesis H to be tested W 

3. General problem of similar regions. W 

4 Regions similar to the sample spare with regard tn <r. £,, £ a , , J\ >V» 

6. The set of hypotheses alternative to fi - SI 

0. The best critical region for testing H against a particular altertinlivp CY» 

7. A critical region of an unbiased type 86 

8. Methods of determining . . . 70 

9. References,.,. 7fi 


* 

1. Introduction, The words"routino analyses" arc used to denote the analy¬ 
ses performed by laboratories, frequently attached tn industrial plant*, and dis¬ 
tinguished by the following characteristics: (1) All the analysts or measure- 
meets are of the same kind, for example, are designed to measure the sugar 
content in beets or to determine the coordinate of a star. (2) The analyses are 
carried out day after day using the same methods and the same mslriunmis. 
(3) While all the analyses are of the same kind, the quantity measured varies 
from time to time and each such quantity Is measured repeatedly n times, 
where n represents some small number, 2, 3, 4, 5. 

As an illustration we may consider the routine analyses of sugar beets in¬ 
formed in the process of selection and breeding. A small section is cut out of 
each of a great number of sugar beets expected to be suitable for further breed¬ 
ing It is crushed and its juice extracted to determine £, the, sugar content, of 
each particular beet. From the juice available from each licet n samples are 
taken and a determination of the sugar content is made from each. Thus, if 
?> represents the sugar content of the section from the ith beet and there are 
N beets, the laboratory will have to make nN analyses with their results jr,.,, 
z., 2 , • • ■ , , representing the measurements of the same quantity £,, Ob¬ 

viously the sugar content £< referring to the ith beet need have no relation to 
that of any other jth beet. 

An essential point in the above description is that the number of measurements 
referring to the same quantity {< is usually very small. For example, the 
quantitative analyses of urine in certain clinics are performed only twice for 
each patient, so that n = 2. Frequently, various practical considerations make 

46 




A STATISTICAL PROBLEM 


47 


it impossible to increase this number n of analyses intended to measure the same 
quantity . 

The smallness of n introduces difficulties in estiminatmg £,. It is usual to 
consider x.m , x,j , • • • , x,, B as independent variables, varying normally about 
£»• with an unknown standard error <r,. If they have to be used to estimate £,, 
then the confidence interval [l] 1 for £, will be determined by the familiar formula 

(1) X;, — s,£ a (n) < £, < Xi. + s,£„(n), 
where x,. denotes the mean of the xa , 

(2) s? = 12 (m,v - x,.) 2 /n(n — 1) 

/-i 

and t a (n ) is Fisher’s t corresponding to the number of degrees of freedom n — 1 
and to the chosen confidence coefficient a. It is known [2] that if the estimate 
of & is based only on its direct measurements x,,i, x., 2 , ■ ■ ■ , xj,„ > then the con¬ 
fidence interval (1) can not be made any smaller; in fact, formula (1) gives the 
shortest unbiased confidence interval for £,. But if we try to substitute appro¬ 
priate numbers in (1) we get disconcerting results. Namely, if n = 2 and 
a — .99, then l„{n) = 63 657. If n is increased, the value of t a (n) deeieases 
rapidly but for n — 5 it is still very considerable, £„(5) = 4.604, and consequently 
the numerical confidence interval determined by (1) is frequently so broad that 
it is devoid of practical value. 

The general conclusion is that, if n cannot be increased, satisfactory estimates 
of £,• can only be obtained when they arc based on something else in addition to 
the direct measurements x,,i, x, |S , • ■ - , . This point was first noticed by 

"Student” [3]. His method of avoiding the difficulty consists in assuming that 
the accuracy of measurements performed in the same laboratory is constant 
in time, so that a\ = cr, = ... = ow = <r. If this is true, then 4 — 'Ss 2 /N will 
be an unbiased estimate of the variance of x ,,, based on N{n — 1) degrees of 
freedom. If the past experience of the laboratory is of any size, as measured 
by N, then the product N(n — 1) will be of considerable, size and the confidence 
interval for £, 

(3) x,. — so t a (N(n - 1) + 1) < £, < x, + s a t„(N(n - 1) + 1) 

will be much more satisfactory than (1). 

The problem which arises is whether we are entitled to assume that <r\ = 
fj * = fif, The first study of this problem seems to have been made by 

Przyborowski [4] in a paper written in Polish. His findings, subsequently re¬ 
ported [5] in English, show that, at least in certain cases, the accuracy of routine 
analyses is quite difficult to keep constant. If it is not constant, then the rela¬ 
tive frequency of the cases where formula (3) gives correct statements about 
will generally be different from the expected a. 


1 Figures in square brackets refer to the literature quoted at the end of the paper. 




48 


J, NKYMAS 


The procedure employed by 1‘rzyhoruweki to test whether «% • .«? * • ** n v 
consisted in considering the quantities i » - in - 1 ami upph mg the \ bvf 
to see whether they follow the* ’ame distribution with » -* 1 degm** of ir*wdom 

(4) ?>U>) = cr 4 " _> V‘ ! ' ,VI 

with an unknown a. 

Just this puirit is to he the main ,subject of litis pnjw r. Tit** t» i t vt;i< d* * 
vised by Karl Pearson with no particular set rtf alteruatrw hvjwtthe-* * m \ ( ,*\v „ 
As a result we may expect that in many eases other tests may br •lexi'-etl wide)* 
would ho more powerful A uumlter of such case* an* nlroadv ou record hi), 

m, ts]. 


2. Statistical hypothesis H to be tested. We shall consider tin* ease when* 
we can observe the particular values of Nn random variables ,r ,_,, * , 1, 2, 

■ ■ ■ , N; j = 1, 2, • ■ ■ , n, and we know that r,., is independent of j,, bar i ^ k 
and that 


(5) 


p(-Ti,l , 2T|,2 ) • • 1 Xi, n ) 


-(■ 1 

\<r»V 


vO 


-t£ 

(-i 


with unknown values of £, and a, > 0. The hypothesis H to lw t<**tcd is that 
ai = vs = • • • = <r y = <r without sjiecifying, however, the riel uni value of «, 
It will be noticed that this hypothesis has already been treated by » nunriter 
of authors [9]-[17], The need for eottHicleriiig it. again arises from the fact that 
previously it was tested against the set of alternatives presuming that the <r, . 
°a. <?#> were positive constants having any values whatsoever. It -eerm* 

to the author that, in the present ease, the set of alternative* should lx* different. 
This will be explained in the next section. It follows that while the hypothec 
tested is the same as in the papers quoted above, the problem of testing it is 
quite different. 

Let us denote by E the whole set of Nn observable variables. If // i* true 
then their elementary probability law will be. 


(6) 


V(E\H) 


i ^ -i n 


3, General problem of similar regions. The development of the test will 
follow the general lmcs explained elsewhere (18), [19], [20], Denoting hv W the 
Nn dimensional space of the x t ,,% we want to determine a region «• in H* having 
, e J°.°. wing P ro P e rtie8: (a) if the hypothesis tested is true then the probability 
0 _ f ® llmgm . w sha11 have a °me fixed value chosen in advance, e.g., « ~ os „ r 
:-“umw^ Pr ° babllitjr ia known aH tho ProhahiUty of an error of the first 
“ TV H " n ° ttn l e thcn thc Probability of E falling in w m determined 
by one of the alternative hypotheses (that we assume likely to be true when 11 
false) shall be as large as possible in a sense that requires further explanation. 



A STATISTICAL PROBLEM 


49 


The probability with which this condition is concerned is a complement of the 
probability of an error of the second kind. Once the region u> is chosen it will 
be used to test U in this way. if E falls within w, then H will be rejected 

In the present section we shall deal only with ways of satisfying condition 
(a). The problem is similar to the one recently described by Hotelling [21]. 
The difficulty is that, if H is true, the probability law of E is given by (0) and 
contains N + 1 unspecified parameters, “nuisance” parameters as Hotelling 
very appropriately calls them. If we take just any region w then it is most 
likely that the probability of E falling in it will vary with different values of 
<r, £i , • • • , £* • As a matter of fact, if we want the test to be absolutely most 
pow'erful, or at least relatively so, we must determine not just one single region 
satisfying (a) but actually all such regions or some broad family of them. From 
these we .shall then select one, which seems most satisfactory from the point of 
view of (b). 

Systematic methods of determining regions of the above kind have already 
been considered [18], [20], [2] In these publications they are called "similar” 
to the sample space W. The reason for this term is that the whole space W does 
possess the required properties with « = 1 In fact, whatever be the values of 
the nuisance, parameters, <r, the probability of E falling within W, 

as calculated from (G), is perfectly determined and equals 1. Our problem is 
to find a region w, part of W, with similar properties for 0 < ,e < 1. However, 
in many cases no such regions exist [22J. 

The general methods in the above publications are applicable in the present 
case. However, a recent paper by Cramer and Wold [23] allows a slight im¬ 
provement in presenting the matter. As this is a little involved, it seems de¬ 
sirable to take up the whole problem and present it anew. 

Consider then the general ease where the probability law of some m observable 
variables 2/i, J /2 , ■ ■ • , Vm , say p(E \ 0i , • • • , &,), as specified by the hypothesis 
tested, depends on s nuisance parameters 6 1 , , • * • , 0 ,. Our problem will 

consist of determining the neceasary and sufficient conditions for a region w to 
be similar to the sample space with respect to all these parameters. We shall 
assume that the probability law p(E | 0i, ■ ■ • , 0,) satisfies certain limiting con¬ 
ditions 

Let 


( 7 ) 

( 8 ) 


_ 3 log p 

§ei~ 


¥>ii *= 


5 s log v 
dOt&if 


Assume that for all values of i and j — 1, 2, ■ • 


s 


(9) 





where the coefficients A,, 7 and are independent of the observable variables 
E, Assume also that the probability law p(E \ 6i, • • ■ , 8,) permits indefinite 



50 


J. NKVMAN 


differentiation under the sign of the integral taken aver any Inud rn-ginn m in ft'. 
It is easy to ehcek that the probability law <t>) satisfies ail of thew* conditions. 

In order to find the necessary condition* fur the region «■ to hi- >imilrtr to W 
with respect to 61 , 02 , • ■ > , 0, , awn me that ic is actually similar ami that, row 
quently, 

(10) P(£ , «w]fli ) » /-/** ! Si, • • • &,) dyi • - • ihj„ s« « 

for all possible values of , 8 t , * • > , 6 *. It follows that the derivatives of all 
orders with respect to 6, , 0 t , ■ • ■ , 0, taken from the left Hide of (10) mti*t he 
identically equal to zero. But we have 

idj ’'' £ |th, • • • t 0,)%i • • • dy„ 

( 11 ) * / * " / p(B 1 &U *••,#.) dyi * - * dtj m 

=*/'•*/ <PiP(® \6\, - • > , S,) dyi ... dy m m 0 


for i = 1, 2, ... , a. Similarly, using (9) 
d 1 


( 12 ) 


96(09, 


J ■ • • / p(JS 10i, • • ■ 0,) dyi ■ .. dy m 


<pivi + A,/ + E &■/,*»>*) p(K \ 0 t , * *., ft,) dm ... <%„ 

*~i / 


Using (10) and (11), the last identity will be reduced to 

(13) l f ... f vivipiE | fli, • • • 0») dy\ * ■ > dy m m —At,/ for i,j « 1,2, .. •, n 


where the right side does not depend on the particular region w, provided that 
w is similar to the sample space, Considering the identities (11) and (13) 
which were obtained by differentiating (10) twice, we may guess what will 
happen if we differentiate (13) again and again. We may assume, in fact, that, 
whatever be the non-negative integers An , h , • •. , k,, we shall obtain 

(14) “/•■•£ n <£' p(E \8i, •” ,8,)dyi ... dy m m M(k lt fet, ... 1 k,}, 


whereM(/ci, ■ ■ • , k.) is independent of the particular region u>, provided that w 

is similar to the sample space with respect to all of the 8‘s, Assume that this is 

■ 

found for all fc’s such that Z k t < K; also assume that the sum of the k 1 a in 
(14) is exactly K. Differentiating with respect to 0,, we obtain 


(15) 


^ / *" £ { w JEJ **' + £! ¥>*' E <pT' w./|p(B | 0 i, ••■, 0 .) dy t .. 


dym 


d_ 


M(h, 


• • * | &«)* 



A STATISTICAL PKOELEM 


51 


Because of tlu* particular form of p, t] , Hu* second expression in the curly brackets 
under the integral is a polynomial m the v>’s of order not exceeding K. According 
to the assumption made, this expression multiplied by p(E \ 0j , ■ ■ ■ , J,)/e and 
integrated over w gives a result which is independent of-w. As the right side of 

(15) is also independent of u\ we conclude that 

~ f W II *>-,0«)rfyi ••• dy m 

( 16 ) * ^ t*«i 

K3 M(k x , •. ■, k, + 1, •• *, k,) 

is also independent of the particular similar region chosen. We have seen that 
(14) is true for K < 2 and that if it is true for K it is true for K + 1, that is, 
it is true in general. 

We may now sum up our findings: if w is a region similar to the sample space 
with respect to all of the 0’s and if e denotes the value of the integral (10), then, 
whatever he the non-negative integers k x , ki , • • • , lc, , the value of the integral 
on the left, side of (14) is independent of the particular region w chosen. 

As the whole sample space W is also "similar” with e = 1, it must satisfy this 
identity. This allows us to determine, the M’s, namely 

(17) / > - * f II ¥><‘ P(£ I 0i, • * *, 0.) dyi ’ •. dy m s* M(fc l( • • •, k,). 

J -hr i-i 

It is obvious that the necessary condition above is also sufficient. If a region 
w is such that (14) holds for all systems of non-negative integers then all the 
derivatives of (10) must be. identically zero; thus the left side of (10) is inde¬ 
pendent of 9y , 0 S , •. • , 0,. 

It will be useful to interpret the above conditions as follows. We start by 
noticing that the loft aide of (17) represents the product moment of some speci¬ 
fied order of the pi, <p%, • • • ,p. considered as random variables. We shall call 
it the absolute product moment. We will now interpret the left side of (14) 
as a product moment also. For this purpose we shall define a new elementary 
probabiliti r law of the y’ s to be denoted by p{E \ w, (h , ■ • ■ , 0.) and described 
as the relative probability law given w. We shall write it as 

(18) p(E\w,6 u ,0.) « - p(E |0i, ... ,0.) 

for all of the points E included in w and 

(10) p(E I w, 0 X , . - • , 6.) » 0 

for all other points. With this definition the left side of (14) appears to be the 
expectation of the product • <&’ calculated from the relative probability 
law of the y ‘s given w. We will call it the relative product moment given w. 
The final result-can now be stated as follows: 

For a region w to be similar to the sample space with respect to 0i, 0a, • • • , 6 , 
it is necessary and sufficient that all the relative moments and product moments 



52 


J. N'KYMAS 


of ^ , tpi , ... , v?, Khali equal tin* corirsjMndimi .'ib«d«ib* m“rm r <u me! pi*elo>t 
moments, 

In order to make the method of romdrurtmg ‘■imilar region- n<‘t odmg P* the 
above conditions clear we recall the procedure boohed b< the cal* ubifurn of fl,»> 
probability laws of any given cot of random variable-* 

Assume then that the elementary probability bnv of the original vari»M> r »** 
given. Fix some values of the parameter. <? a . #3 . • - * . 0 *, d» m»iu tp. r t <tt3tmg 
probability law by p(K), and consider the problem of finding *L»* eVne ntmrv 
probability law of vu« ft < ■ - * > ft eonsidered ae function- of ?b<- yV M* ..b.sll 
assume that none of the ip's ran be expressed m* a function of tfn* Mtii<o not 
involving the y'a explicitly so that the matrix 


rVi 

rVi 

rVi 1 

Qyi 

kh 

i 

. j 

c3pi 

df. 

<V» ) 



dj/«; 


is non-singular. In these eireumstanees it is possible to select m ■ » b«netiom» 

of the y'a say ^, +l , , • • • , which have continuous w-eond dr*m ativri* such 

that the formulae 


( 21 ) 

z, ~~ j ,o * !,•»*. m 

determine a one-to-one transformation of the spare IT nf the jA into the space 
W' of the z'h. If w denotes any region in IK then it will 1m* tram-formed mUi n 
^perfectly determined region w' in If". If E* denotes a point m H" then the 
probability of E 1 falling in w' will be identical with that of E falling in »e, Thus 

(22) P[E'tv)'\ es P{2?«w| « J •.* J p(E)dy t dy m . 

Letting J be the Jacobian of the y'a with respect to the a'k in the transformation 
(21) and using the known formulae for transforming multiple integrals, we have 

(23) P{#W) = f ... f^piEn^Jldz^.'dz*, 

where p(F)]*' denotes the result of substituting the expressions for the i/ ? * in 
terras of the z’s as obtained from (21) into p(E), It follows that, whatever be 
t e region w in IK , the probability of li"a falling hi it is obtained by integrating 
the function p(E)] K > | J | over w\ But this moans, according to the \mm\ 
demition, that the product p(E)] x , | J j is the* elementary probability law of 
the z s Denoting it by p(E') = p(zi , • * • , z m ) we have. 

< 24 ) PCS') - p(E)] gl | J |. 




A STATISTICAL PROBLEM 


53 


Now, to obtain the joint probability law of <pi , <pi, ■.. , <p, or that of t\, 
z-i, ■ ■ ■ , z s wo must integrate p(E') for all the other z’s between their extreme 
limits, formally between — «> and + <* for each of the variables concerned, 

/ -f rt+oa 

••• / p(E J ) dz, +l ■ ■ ■ dz m . 

CO J—aO 

This procedure will be applied when calculating the absolute probability law 
of the ip’s and also the relative one given n>. The only difference will be that in 
the latter case we shall have to start with (18) and (19) instead of the original 
probability law. The space W' and the transformation (21) will be the same 
in both cases. It is important to be clear about the difference between the two 
cases. This is connected with the difference between p(E | fh, • • ■ ,9 ,) and 
p(E | w, di , . ■ * , 0 ,) of (18) and (19) The latter is proportional to the former 
at any point E within the region w but is zero outside of w. As mentioned 
above, the integrations for z a+ i , z f+2 , • • - , z m in (25) should extend formally 
from — » to + foi each variable. However, the probability law p(E') may 
equal zero within certain parts of this range. Fixing any system of values 
s, = , for i = 1, 2, • • • , s, is equivalent to fixing a hyperaurface in the space W 

and considering the intersection of planes z, = constant in the space W'. De¬ 
note them by W{v) and respectively. If we shift the point E or E’ 

along W{tp) or W'(<p) respectively, the variables z, = f ,, for j — s + 1, 
s + 2, • • • , m will assume a certain set S(<p) of systems of values. When calcu¬ 
lating the absolute probability law of <pi , ■ • • , <p, this set S(<p) will be the real 
region of integration in (25); outside of it the function under the integral sign 
will be zero. On the other hand, when calculating the relative probability law 
of ipi, • ■ • ,<f>, given w, the function under the integral (25) is zero as soon as 
the point E moves outside of the region w. Denote by w(<p) that part of W{<p} 
which is included in w and by w'(p) the corresponding part of So, the 

absolute and the relative, given w, probability laws of , • • • , can be ob¬ 
tained by using the formulae 

(26) p(<pi, •■•,¥>.)= f f p(E') dz ,+1 ■ ■ • dz m 

(27) p(<p i, •••,«?.! w) - ~ f f p(E') dz,+i • ■ ■ dz m . 

Now the method of constructing regions similar to W with respect to d x , 
<?»>•••, is clear: to construct any such region it is necessary and sufficient 
to select for each of all possible systems of values of <pi, ^ , • • • part w(ip) 
of the hypersurfaec W(<p) and to combine all these parts. The selection of w(<p) 
is arbitrary save for the restriction that the probability law (27) have all its 
moments equal to those of (26), identically in the 0’s. This last condition will 



54 


J, NBYMAN 


certainly be satisfied if wfo) is so selected that for almost nil systems of value® 
of <pi i i ■ 1 • i <P* 

(28) , • • • > <?• I to) r -'- 7*1*1 > ' • * i 
for all values of the 8’s. 

By selecting w(<p) in all possible ways that satisfy (2S) we obtain an infinity of 
regions similar to W with respect to 9\, 8->, • - • . 0 ,, They form n family w Inch 
we shall denote by F(e). However, it is known that in general ail the moments 
of , ■ • , ip„ | w) and p(v>i ,*-■,¥?.) may be identical without the two proba¬ 
bility laws being equal almost everywhere. In Mich cases, the family Fit) will 
not exhaust all the similar regions. It is important to he able to state whether 
or not F(e) contains all the similar regions. To ascertain this we may use the 
conditions of Cram6r and Wold [23] which are sufficient for the drtenninateiira* 
of the problem of moments, that is, for the uniqueness of a function having a 
given set of moments. 

Let 

(29) g, = M(u, 0, 0, •.. , 0) + M{ 0, 0, • • • , 0) + • • * + M(Q, 0, . - , 0. rh 

With this notation the conditions of Oramtir and Wold can In* stated as. follow®: 
If any two probability laws, e.g,, the probability laws ;>(y,, -. * , ] u») and 

p(ifi i, , <p,), have all their moments and all their product moments identical 

and if the series 

(») 

t 

is divergent, then 

(31) p(vi | to) sa p(vn 

almost everywhere. 

Therefore, to know whether the family /'’(«) defined above exhausts all the 
regions similar to W , we must calculate the even moments of all the vU' and see 
whether the series (30) depending on these moments is divergent. If It i«, there 
is no similar region besides the family F{t), Otherwise, there may be some 
others. These others will be constructed by selecting w(<p)’a guch that the in¬ 
tegral (27) equals any other probability law having the same momenta aa (20). 
In such cases, a region w selected, intone way or another, from the family F(t) 
as the best from the point of view of controlling errors of the second kind will 
only be the relative best. 

It should be mentioned that whether we can always, under the conditions 
considered, select a w( v ) on any F(*j) that satisfies the identity (28) has not 
yet been proved. However, it seems plausible that the differential equations (0) 
imply the existence of a sufficient set of statistics for 8 i , 6 % . If this is 

so, the possibility of satisfying (28) is guaranteed (see [2], p. 366). 



A STATISTICAL PROBLEM 


55 


4. Regions similar to the sample space with respect to <r, fa, , •••, fa. 

We may now return to the original problem and apply our theory to the proba¬ 
bility law (6). We wish to construct the most general regions similar to the 
sample space with respect to the nuisance parameters , • • ■ , unspecified 
by the hypothesis tested. We let 


(32) 


v« 


d log p 
dcr 


<x c r® 


E 


j-i 


(x* ~ 


(33) 

Then 


Vi = 


d log? = n(x„ — £,) 
3& a 1 


with 



dip. 3 2 Nn 

— — ~ < p<r — « 

dc a a* 


(34) 



— 2<rp, 


dipt 

dipt. 

%i 


_n 

a 2 

0 , i 7* j 


and we see that the probability law (6) satisfies the differential equations (9). 

Now the hypersurfaces W(<p) of the theory are the intersections of the hyper- 
surfaces 


(35) tp. ~ constant and = constant, for i = 1, 2, • • • , N, 

The latter equations are clearly equivalent to 

(36) a:.. = constant. 

As to the former, we notice the identity 


(37) 


E E (x,, - fa 2 = n E CSJ + (x.. - fc) 2 ) = x 2 , (say) 

.-i j-i <-i 


where n£ 2 = E (x tl , — x,- ) 2 . 

i-i 

hypersurfaces (36) with, say, 


Therefore, Wfa) denotes the intersection of the 


(38) 


N 

Ti = E S 2 = constant. 


<-i 


If we succeed in selecting from, each hypersurface Wfa) a part to(<p) satisfying 
condition (28) identically then the sum of all such regions u>(<p) will form a 
region w similar to W with respect to all the unspecified parameters and belong¬ 
ing to the family F(e ), Before proceeding to this stage of the solution,let us see 
whether the family F(t) exhausts all of the similar regions. i 



56 


J. NEYMAN 


For this purpose notice first, that instead of roti-idering whether then i* Serf 
one probability law with moments equal to those of v"« and the f,'*, it is Midi- 
cient to concern ourselves with tin* moments of y' and x, . In kset, all the • 
are functions of these variables and the problem of uniqueness of the distribution 
must have the same answer in both eases. The 2rth absolute moment of 
as calculated from (6) equals 

(39) (2ff S ) 4 T(|iVn + 2r) Tr JA'u). 

The same order moment of x,. is 

(40) <r 2 '(2v)!/(2n)M. 


Thus, the quantity denoted by /i 2 , in the theory becomes 


(41) 


+ 2 u) 

~ - r(0*r ' + 


. v (°y& 

W 2'i 


(2c)! 

i ‘ 


We are interested in whether or not the senes f30) is divergent. Since g ? . r satis¬ 
fies the inequality 

(42) < a*T(6 + 2c) - CtV, (say) 


with a = 2a + N and 2b - Nn t if we prove that the series 2f V diverges, then 
(30) also diverges. To settle this conveniently we apply Stirling's formula to 
r(6 + 2v) and find that, as » —*• », the ratio (! tr /v 1 tend* to a finite limit. Ah 
the series 2c" 1 is divergent, so is the* series 2C», and thus the series 2gj,‘ ir is 
divergent. Therefore, there is but one probability law with moments identical 
to those of x 2 and the a^.’s and so the family /’(*) contains all tin* regions similar 
to the sample space with respect to a, £i, «■ > , £* . 

It may now be interesting to go into some details of the effective construction 
of any legion similar to W with respect to <r, £i ,*•*,$* . Fur this purpose it 
is convenient to go back and express the identity (28), that the regions irt v «) 
must.satisfy, in terms of the relative probability law of z. + ,, z, i3 given 

<Pi, <pt , • • ■ , v, This is denoted by p(z, h i, z, ti , • • • , z„ | v>«, • • ■ , <p,) and de¬ 
fined for every system of values of the v’b for which J)(vn , vt, • • • , <f>,) & 0 us 
follows: 


(43) 


p(z.+l, Zi+l , • ■ * , Z„ I VL , Vi I Vi) 


- p(vi> ’ * • , Vi, Zm-1> * ‘ , Zm)/p(v 1, * * * , Vi) ■ 

Using (26), (27), and (43), the identity (28) can be rewritten in the following form 


(44) 


/ " * l, M P ^* +1 ' ,v>) 


dz, 


(t+1 




The function under this integral is the relative elementary probability law 
of z.+i, «,+2 , ■ > ■ , z m and it is integrated over the region v>'(v j. Therefore, the 
left side of (44) is nothing but the relative probability of the point E' falling in 



A STATISTICAL PROBLEM 


57 


w'(tp) given that the first s of its coordinates have the fixed values • • • , <p,. 

In other words, and owing to the one-to-one correspondance between the spaces 
W and W', we have 

(45) P{E' e w'(<p) j E' e W'(<p) J = P[E e w(<p) \ E « W &)} = <=. 

Now the general method of determining similar regions may be stated as 
follows: 

1 . Choose any system of variables z,+i , z,+i, ■ ■ ■ , z m such that their values 
determine uniquely the position of the point E' on any fixed hypersurface W' (<?) 
These z's considered as functions of the y’s should be continuously differentiable 
twice. 

2. Find the relative probability law of the z’s given the <p’s. This must be 
done for every possible set of values of the ip’a. 

3. In the space of z s+ i, z«+ 2 , • • ■ , z m consider regions which satisfy the equality 
(44) identically in the 0’s. Any such region could be taken to form a part of w 1 , 
the region similar to the sample space, which we are trying to construct. If 
the assumption that the differential equations (9) imply the existence of a suffi¬ 
cient system of statistics for Q\, 0j, • * • ,6, is true, then (see [2], p. 366) the 
probability law p(z,+i, z,+ 2 , • • - , z m \ <pi, • • • , </>•) will be independent of the 
0’s and there will be an infinity of regions satisfying (44) 

Obviously, instead of dealing directly with , <& , ■ • ■ , y» as described above, 
we may select any system of statistics T \, Ti, • • • , T, such that the system of 
equations T t = constant is equivalent to <p, = constant, for i — 1 , 2, * • • , s. 

Returning to the particular problem of similar regions with respect to a, 
£ 1 , - ■ • , £v, we notice that instead of the <f>’s we may consider 

v 

(46) Ti = 53 and T <+ 1 = x { . for i = 1, 2, ■ •., N. 

Now we wish to select a convenient system of variables, denoted by z,+/s in 
the theory above, to determine the position of the point E' on any hypersurface 
W'(<p) where all the functions (46) have fixed values. Obviously there is no 
unique choice and we shall use what we find convenient. But notice that the 
total number of these variables should be, in our case, Nn — N — 1. The 
following system may be suggested, 

If the sum has a fixed value T 1 then none of the S? can exceed Ti. Write 

S] = UiT-i 

(47) / \ 

Sn~( 1 — 2 UijTi for i — 1 , 2 , • • • , N — 1 

and consider Ui, ui, • • • , w,v_i as belonging to the system of variables sought. 
The region of their variation is determined by the inequalities 

N-l 

0 < u, and .^2 Ui < 1 
•-1 


(48) 



68 


J. NEYMAN 


If the u's are fixed then they, together with the value of 7',. rHmiiim- the 
values of &,&,••• , -SV . As the values of x, - 7\ 4 i are already fixed, we 
have to solve the problem of choosing for each i =- 1, 2, • • • , jV a system of 
n-2 variables, say Z;.i, z,.*, • ■ • . z.,»-s , which with x,. and .S', will completely 
determine the values of x.,,, x„ s , • • • , x,.„ . However, this will only have to 1 




(49) 


, we may determine the z (l 

, in two consecutiv 

c sU']w. 

First write 

Xv. i = Xi, + 

// iV' 1 + 

4^ 2-3 11 '’ * " 

+ y / (.. 

I 

— 1 Jn 

Xi,t = a*. - 

4/ lV U + 

4/ 2 " 3 + "* 

+ i4< 

1 ~ , 

- 1 in* 1 '* 1 

X<,3 = X(. 

— : 


+ V 

l" ‘ , 

~ 1) h***' 4 

Xi,„ = X,. 


-(n- 

~ l W C» 

-"i)V W 


where tij.i, v iA , • • • , u., n —i are new variables satisfying the identity 

(60) 2 <4.; “ 2 (*i,/ — *{■)’• 

/-i i-s 

We transform them further by putting 


(51) 


Vi,\ = -\/nS( cos Zj.n—) eos •.. coa * (ll eo« 

«i,5 — -\/nS{ cos Z(,«—i cos ■ • • eoa z,, s sin zi,i 
Vi,i = y/nSi cos cos z<,„_« • - > sin «{,* 

= Vn & sin z<,n.-a 


with the z’s varying as follows 


(52) 


0 5; Z<,v < 2tt 
— tt/2 < Zi,j < ?r/2 


for j = 2, 3, * 


n — 2, 


* Of course, instead of the Si we should put their expressions in terms of 7*i and 
the u’s into (61). With the exception of a set of measure zero, which Can bo 
ignored, the formulae above determine a one-to-one transformation of the 
original space W of the x’s into the space W'' of T t , IP, , . <. , 7V,t 
m*_i , and z <: i , Zj.j, • • ■ , z< t „_ 2 for i <= 1, 2, •. • , JV, 

In calculating the joint probability law of all the new variables, we notice 
that, on the hypothesis tested, all the Nn original variables arc mutually inde¬ 
pendent. Consequently, 'the transformations (49) and (61), which refer to 
separate groups of the x,,/s, corresponding to fixed values of t, could be carried 





A STATISTICAL PROBLEM 


59 


through separately. In doing so, we use formulae deduced elsewhere (see [5], 
pp. 38-39) directly and obtain 


(63) p(x { .,S {) z.,!, ... 
It follows that 


^n—2 g—Jn (sj+ti, 

.try/ 2t/ 


n cos 1-1 Zj.i. 

jmb»2 


p(*i- > 


(64) 


• ) ) Sj. 1 • • * , Str, Z(,l , • • • , Ztf.n-j) 

N 

= n p(x,.,&,Z,,l, , Zi.n-2) 

i-L 



g—i* ti)*/® 5 


nsrv^^nncos^z*,,. 

v-1 fc-i >-i 


We now wish to introduce Ti and the u, instead of the SJa. Since all other 
variables remain unchanged the Jacobian of this transformation reduces to that 
of (47). Simple calculations show that 


( 56 ) 


d(»Si, Si, ■ 

• ■ , Sh) 

d(Ti, Wx, .. 

’ , Utf-1) 


- ®" r!'"-” (i - Ij u,) - * |f u ? 1 


Using this expression and substituting (47) in (54) we finally obtain 
P( X !■ > * ’ ' » X N> , T\,Ul, • • * , UjV-l, Zl.X, * • Ztfjt-i) 


(50) 


( 4 ) 


*( Vn_Y n 

\<r y/2irJ 


0-1" t» >*/»* jiitfCn—D—1 


Hnri/<r* 


// \ w-l \JCn— 3 > N n-i 

XV, 1 ” S E5 U< ) S JJ C0S ^ 1 


To obtain the relative probability law of Ui, Ui , • ■ • , u N -i , Zi,i, • •• , z N , n ~i 
given 7'i and the 2\ + i — x ( . , we must calculate p{T 1 , T a , ... , IV+i) and 
divide expression (56) by it. Of course, p(Ti, T 2 , • • • , T* +1 ) is obtained from 

(56) by integrating over the whole of W'(<p), that is, for all other variables be¬ 
tween the extreme limits of their variation. As these limits are independent of 

the values of Ti , T 2 , - • • , 7V+i, the result will be 

(57) p(Ti, T t , , T n+1 ) = c< r ln 


where c denotes a constant. Thus 

p{ui, * • • , Uff-I, Zl,l, • * • , Zv,h_2 | 7 1 !) • • • ) T)t+ 1) 

// A-l \ V-I \J(*-8) W n-5 

= ci((i - z «<) n wt) nn cos m 

\\ 1-1 / (-1 / Jfc -1 (-2 


(58) 



60 


J, NKYMAN 


with the region of variation II - , (v ? ) limited h,v the following iinijHiilitif,- 

V i 

0 < Ur, I«.<1 

< 59 ) 0 < Zk.i < 2r fort » 1,2, ... . A*. 

— r/2 < 2jf,i tt/ 2 j 2, 3, > > ■ . jj - 2. 

Since (58) integrated over F'(*>) is identically unity, r, in h purely numerical 
constant. 

Now to construct any region w similar to the sample apace with rc^peel to tt. 
ii, * • • ) Si*» we must select, separately for each ami all -vdeiii* M of values 
of Ti, '1\ , • • • , T n+ i, a region te'fv 5 ), part of F'ty) as defined }»y fd*), with the 
sole restriction that 

I • • • I p(ui, • ’ ‘ , Wx-l, 2|.»( * ' * ,W-~1 | I \ 8 * ‘ * * ^ Vlll 

(60) J 

*** , </**.., t t. 

Obviously, there is an infinity of ways of selecting any single one of *uch 
regions. For example, we could let the ids vary as indicated in (,dpi and limit 
the z’s by 

(61) 0 < zk,i < o, —a < Zk, t < a (k = 1, 2, ■ ■ < , A"; j -■ 2, it. • • < . n — 2) 

where a is chosen so that (60) is satisfied. This choice of irVl may grtmwpnntl 
to one particular system of values of 1\ , T %, • • ■ , and no other. Again, 
the same region (01) may be clumen to sorVc for all systems of value** of the 

In this case, the region w = 2«(p) might he described as cylindrical, Any 

¥> 

such region w will control errors of the first kind in testing // to the name level 
of significance e and, as far as these errors alone are concerned, each of these 
regions is of equal value. Whatever the choice of regions tN(w) or u’(^), the 
test of H will consist of (1) observing the values of the (2) calculating the 
corresponding value of T x , T 2 , ■ •. , T,v + l , the u’h, and the z\ and (3) noting 
whether the point with coordinates Ui, u*, »• • , m.v~i, 2 i,i, • - < , falls in 
the region w(<p) chosen to correspond to the observed values of T \, Tt , * • * , 
T n+ x . Of course, in practical cases, the choice of v /(•$) for one system of values 
of the T’s will not be quite unconnected with that for others. On the contrary, 
there will probably be some more or less simple rule connecting ii-Vi with the 
corresponding systems of the T‘ s. As a result, the actual machinery of the teat 
will be much simpler than that described above and will consist of the calcula¬ 
tion of only a very few functions of the .t’h and in checking some simple in¬ 
equalities. 

Now our purpose is to select a region from the infinite family FU) of all 
regions similar to the sample space with respect to <r, (fi , ... , which wo judge 
most satisfactory for controlling errors of the second kind. Rougldy speaking, 



A STATISTICAL PROBLEM 


61 


this region will have to be such that, if the hypothesis II is not true, the observed 
point E will fall in this particular region as frequently as possible, in general. 
Here we come to the necessity of specifying the ways in which we expect the 
hypothesis H to be untrue. It may be untrue in an infinite number of ways. 
For example,, the values of the, <r’s may (1) be equally distributed over any given 
range, (2) may fall into just two groups = 1 and tr, = 2, or (8) all <r,’s except 
the last may have the same value a while the last is 10cr, and so forth. Any 
such assumption will be called an hypothesis alternative to II. It is obvious 
that the probability of E falling in any given region w will be different for each 
of them. Therefore, if we wish to deduce a test which will detect the falsehood 
of the hypothesis tested frequently, we must analyse the practical cases where 
the test is to be applied and guess the ways in which the hypothesis tested is 
usually wrong. Then we can deduce a test which will be, in one sense or 
another, most sensitive to the assumed deviations from the hypothesis tested. 
Needless to say, our guess may be right or wrong In the latter case, an in¬ 
creased volume of observational material may demonstrate its fallacy and sug¬ 
gest the necessary modifications. In any case, it is important to know exactly 
the class of alternatives for which our test is, in some particular way, the best. 

5. The set of hypotheses alternative to II. Let us consider the routine analy¬ 
ses made at some laboratory and try to discover the circumstances likely to 
cause variation in their accuracy. First of all, we may think of assignable 
causes such as a change in personnel, apparatus, or accommodation. These 
and similar causes are likely to produce lasting effects; the test of the hypothesis 
that they did not reduces to one of the equality of only two cr’s. An easy 
application of known theory [20] shows that the familiar F or z test is unbiased 
of type B x , which means that it is preferable to any other. Consequently, 
situations of this kind and also similar one for which the Li test is applicable [9], 
need not be considered here, so that we may concentrate on cases where there is 
no directly assignable cause of variation in the accuracy of the analyses. As¬ 
sume then that the personnel, the apparatus, the accommodation, etc,, remain 
the same. Now the accuracy of analyses depends on a multitude of causes 
evading identification, such as changes in the efficiency of the workers. In 
principle, they try to have the highest, and therefore a constant, level of accuracy. 
Uncontrollable circumstances cause some fluctuations about a certain average 
and we expect that small deviations from this average will occur more frequently 
than large ones. With this in mind, the author feels that it would be appro¬ 
priate to expect that variations in accuracy, if any, will have a random character 
so that any <n referring to one particular group of analyses, or any monotonic 
function of that a, could bo. considered as an essentially positive random variable, 
having some unimodal probability law. To make the problem of the best test 
sufficiently specific, we must specify this law entirely. Here we face a some-, 
what embarassing freedom of choice. For lack of more precise information as 
to the random variability "of «r< , we guide ourselves by considerations of ease in 



62 


J. XXYMAX 


calculations. From this point of view it in convenient *»» r«*»*.ul«r fit*' variable 

(62) h ~ n 1 

and assume that, within a Riven period of tinu* which ir* not ton long, when the 
conditions in a laboratory are sensibly constant, it is varying affording Ui the Saw 

(63) * p(h) « fiT- 1 * A JHa) for 0 < A, 

where a and fi are unknown non-negative constants. It is useful to express these, 
constants in terms of two new ones which have ari obvious interpretation. , the 

expectation of h, and v, the square* of the coefficient of variation of h, Ka*>y 
calculations give 

(64) ct = 1/v, {3 =" 1/Afl**. 

Now p(h) has the form 


(65) 


P(h) 


(A« y) l "V(lM 




Wc note that when v —» 0 the probability law (65) tends to a limiting dis¬ 
continuous form with P\h *= Aol *■ 1. This cnrmqwmds to Un> hyjKdheets fl 
that we wish to test. The type of law* represented by (tlfi) is known to be 
rather flexible. Consequently, we may easily assume that even though the true 
variability of h (or a) does not exactly correspond to (65), there will Is* a system 
of values of h and a for which the difference Is*t ween the tnn* law and 015) will 
not be large. Therefore, a test which is particularly sensitive to deviations of s 
from zero with law (65) will be reasonably sensitive in real practical raws. 
However, this is an assumption by the author. But it m subject to test and this 
will be done below. 

Formula (63) represents the .hypothetical probability law of the variable h 
which is not directly observable. We must use this formula to obtain the 
probability law of the observable x’s alternative to (6), which corresponds to the 
hypothesis II being true. Using h — I/<r J , we writes the relative probability law 
of an,i, xo, • • • , an,„ given h 

(66) p(x M | fc) - Q~j n j+AtoJ-W'. 

Multiplying (66) by (65) we obtain the joint probability law of h and the j\./k 
referring to one group of analyses 

(67) p(»,0 - 

. frtegratibg (67) with respect to h from zero to infinity, we obtain the absolute 
.probability law of x.-.i, x i:i , , xi , n , all referring to the ith group of analyses. 

Assuming that the,value of h in one group of analyses is independent of that in 
another, we obtain the joint probability law of aU the Nn observable x f ,,’a by 



A STATISTICAL PROBLEM 


63 


simply multiplying the probability laws referring to particular groups of n of 
them. 1 lie result will depend on N + 2 unknown parameters, 

£,v , bn , and v. As the last two will play a more important role than the others 
we shall denote* the probability law by p(E \ ho , v). Easy calc,illations give 


(68) p(E\ho, e) = 


(l'{n/2 + l/e)V (M iW " 

\(2ir)" , *r(l/i»)/ A/, , hoy 


S( 


i + Y £ 




W e easily check that for v —* 0 (68) approaches the law (6) with ho = oT 2 . 
Therefore, the problem that wo shall treat below will be to assume that the 
observable x's follow (08) with some ho > 0 and some v > 0 and to test the 
hypothesis // that v = 0. Morn specifically, we shall try to choose among all 
the regions of the family F(t ), found in the. preceding section the one over which 
the integral of the function (68) is, in general, the largest 

Before doing so, it may be useful to exhibit some experimental evidence in 
favor of the assumption that, if a is not constant in some conditions of analysis 
or measurement, then it varies in such a way that the variability of the x's has 
at least some characteristics appropriate to (08). 

Introduce the notation 


( 69 ) 


«. = nS* = Z (*.-., - 

i~i 


Using transformations (49), (50), and (69), successively, wc easily deduce the 
probability law of <*>, 


(70) p(w) 


^W2^ r (Kn - 1) +_l/x) 

‘ r(i(« - i))T(i'A) 


l(n-S) 


(i + 


If the hypothesis we have made about the variability of h, as expressed by (65), 
is true in any particular case then the sums of squares (69), referring to each 
particular group of analyses, are distributed according to (70). The reverse is 
not necessarily true, of course, but it is comforting that a check of the above 
in a number of broadly divergent circumstances gives satisfactory results. By 
applying the transformation 1 + hovti>i/2 — t~\ the integral of (70) is easily 
reduced to an Incomplete Beta function whence Pearson's tables [24] provide 
an easy means of calculating the theoretical probability that w,- is within any 
given limits. 

Table I gives several observed distributions of the sums a together with their 
expected ones, calculated from (70) with the values of ho and v fitted by the 
method of moments. The last lines give particulars of the application of the 
X 3 test for goodness of fit. 

The origin of the data used to compile Table I is as follows; 

For the data providing frequency distributions numbered 1 and 2, the author 
is deeply indebted to Professor Raymond T. Birge. The methods of measure¬ 
ment and their purpose are explained in the publications [25] and [26], respeo- 



'FAULK t 

Comparison of empirical thnlnhulnma of *• *nih thrw ralsuIvUrf ftnm 


Number 


Author or j i K Puw’f.vii* 

Source of | 11. T, Dirge j li.T. Dirge «ki ami h'niis. 


A A Mlrlt*<l* 

»<*!,. ¥ I t 

I’(’j.w, ri?,4 
J FYirp^cn 


I 

Kind of Moa- ! A Solar „ „ , , .. t 

surementor ‘S^JW { Hpertrum ***${. ' TT 
. Analysis ZtrS.^ | I-“ ° fIk!fU : I ' ,KM 


0-1 

1-2 

2- 3 

3- 4 

4- 5 

5- 6 

6- 7 

7- 8 

8- 9 

9- 10 
10-11 
11-12 

12- 13 

13- 14 

14- 15 

15- 16 

16- 17 

17- 18 

18- 19 

19- 20 

20 - 21 
21-22 

22- 23 

23- 24 

24- 25 

25- 26 

26- 32 
32-43 

>43 

Total 

' X s 
Degrees of 
Freedom 
P(x l ) 


Frequency 
Exp. Ohs. 
29-38 29 
19'30 20 
13-11 17 
9*16 7 

6-56 6 

4-80 1 

3-59 4 


Frequency 
Exp. j Oh.**. 
15- lOt 17 
13*14 11 
11-30 15 
0 *84, 5 
8*46 9 

7*24 9 

6*17 11 
5-23 4 

4*40 2 

3*69 2 


Frequency 
Exp. Ol m. 
15-56 16 
12*67 17 
10-70i 13 
8*08, 2 
7-53 11 
6*34 I 
5-361 3 
4*54- 7 
3 • 86 1 4 


Frrqueu«\ 
Kxp. 1 

3- 50 2 

7-78 10 
9-37 13 
U-fifl H 
0*28 17 
K * 60 7 

7-HO 7 
6*U0 7 

n. 22 ; i 

5-52 -j 

4- 88 3 

4 32' 5 
3-82: 3 


W & ^ it it 5 


^ J* ft ^ 
HaKlttg 


Frequency 
K\p. f Hw, 
14*90 17 
18*88 If, 

to m 1 f 
13-93 12 
11-20 10 
8-91 7 

7-04 It) 
5-58 U 
* -1*3 7 

3-52 7 


'5 j 

o*ik| 


2 4 3.94 3 '1 

1 3 ’H 3 | 1 

____ 3.0l| 0 | | 

100-00100 100-00 1 00 IOO-OOIkK) 123 -00 123 120-00121 

9,63 12,67 18-W ~ I 18*00 ! 13*35 


thJx" SY S°gro U ping U s S were Sfso ^ e .® r ® u P in ^ in the calculation of 
class at least equaHo 3 5 to have the expected frequency in a 


64 




A STATISTICAL PROBLEM 


65 


tively. These papers also contain various compilations of the results of the 
measurements. However, the oiiginal single measurements, necessary for the 
present paper, are naturally unpublished and Professor Birgc was kind dhough 
to find them for the author in his records. 

Frequency distribution No. 3 was compiled from a book of records of sugar 
beet trials carried out by Messrs K. Buszezyh.ski and Sons, Ltd. in Gdrka 
Nurodowa, Poland. 

The 4th distribution was constructed from the original measurements of the 
velocity of light as published [27] by Micholson, Pease, and Pearson The 
measurements made during single days were treated as forming separate groups 

Distribution No. 5 originated from repeated measuicments of Octane Eating 
conducted by a refining company in California They were made accessible by 
Mr. Walter S, Kvonson and it is a pleasure to express the author's deep grati¬ 
tude to him 

The number of observations in each column is not very large. It may be 
expected that if it were increased, the differences between the hypothetical 
distributions and the observed ones would become more apparent. It seems 
safe, however, to assume that m a number of instances the hypothesis as to the 
character of the variability of to, is not in very bad disagreement with the actual 
facts. It would be most interesting to have some more data on the subject 

(i. The best critical region, for testing H against a particular alternative. It 

seems unquestionable that the most desirable test of'any hypothesis is the uni¬ 
formly most powerful test (IT M P. Tost) with respect to the whole, class of 
simple hypotheses alternative to the one which is being tested. Denote by 11 
the hypothesis tested, by h any simple, admissible hypothesis alternative to 11 , 
and by ft the, set of all h’ s. If w 0 is the critical region corresponding to the 
U. M. P. Test, then w a has these properties: 

(71) (1) P[E e w 0 \II} = «. 

(2) If w is any other region such that P\E etc 1 11} = e then 

(72) P{E eW a \h] > P{Eew\h), 
whatever be h t il. 

Following tlie known method [18], we shall see whether a test of the hypothesis 
H eoyaidcrod in the preceding sections exists which is a U, M. P, Test with re¬ 
spect to tlie whole class of admissible hypotheses that specify the probability 
laws (08) with any h > 0 and v > 0. 

The method consists of considering one particular alternative hypothesis h', 
that is, one particular set of values of h 0 > 0 and v > 0 and finding the. best 
critical region Wh a , r for testing II against h'. If this region appears to depend 
on v and/or on /io thou there is no U. M. P. Test The region Wh 0f , is found by 
determining, for each system (<p) of 7\ , T 2 , •.. , 7V +1 separately, a part w ho ,,(<p) 
determined by the inequality 

(73) p(E \ho,v) > k(<f)p(_E | II) 



66 


J. KKVM.VN" 


where /c(y) is a function of 1\ , A, •• , 7\., ->* d.-brmm.ri tl.nt Hi.- P-hi'imt 
(00) is satisfied. .Substituting rii) and KW) in i7H». taking tin- logarithm of Imth 
sides, hud combining all terms which arreoii-tant nrdrp**nd only *i» 7- , T, . • * * , 
7’.v + ] , we have 

(74) £ log (1 + bh<>vn(iil + (An — ft)*)) S; A (A . ■ ■'. 7\»h. ,Hlv h 

Clearly, for A , A , ■ . Am fixed, this inequality nnp<*“»r a n-trirlinrs <>n the 
variability of ui, Ua , • • • , u*+i white e,., , ■ • * , rv.» 2 are allowed to vnrv indis¬ 
criminately within tlie extreme limits (52). Hut the region >r», dm«-n>m«rd 
by (74) also depends on the product h«v. Therefore, there if* no uniformly most 
powerful test for testing IJ against any and all simple alternative** sjawify iug UXH J. 

7. A critical region, of an unbiased type. Then* seems to bn 110 ground- for 
dissection that when a U M. P. Test exists and is readily applicable, it i* pref¬ 
erable to any other test, but the situation is quite different when there A no 
U. M. P. Test. In such cases, practical considerations may suggest n variety 
of requirements for a second best test of the hypothesis. Among tlu*e, we may 
suggest the following considerations: 

Fix, for a moment, the values of /»« , fi ,•■*,£*, take any region ir of the 
family F(e), and consider the probability of E foiling in w as a function of v 
only. This is called the power function 

(75) (3(r|ia) = J . • * J p(E \ h 0 , v) dxi.i - * • </x«,„ 

Here, of course, v > 0. Because of the properties of regions lielonging to F(») 
we have /3(0 |») * «. If v > 0, the value of fJ(« j w) reprvsents the corre¬ 
sponding probability of the teat (based on w) discovering the falsehood of IL 
It is obviously desirable to have this probability as large as possible. In any 
case, it should be greater than t. This last restriction is known as that of un¬ 
biasedness [19], [20], [28], Further, since it is impossible to maxi mine ${» j vt) 
for all values of v, we must choose, those for which it is most desirable, in our 
opinion, to concentrate our efforts to increase, fi(v | to). One possible point of 
view is that these values should be very close to the hypothetical value v « 0. 
For if v is considerably larger than zero, we may argue that there will be no 
need to apply any refined statistical test to detect the falsehood of H. Of 
course, this argument has no mathematical character and its general acceptance 
is not suggested. In fact, we may argue that if v is greater than zero but very 
small, it will be almost impossible to detect the falsehood of II by any test and, 
therefore, our efforts should be concentrated on values of v which arts of con¬ 
siderable size. 

These,are considerations of non-matbematical character; the role of mathe¬ 
matical statistics is limited to devising testa and elucidating their properties. 
If these last axe understood by practical statisticians, each may choose according 



A STATISTICAL PROBLEM 


67 


to his problem. Note that what could he termed the “properties” of a test are 
summarized in the power function 0 (v | w) with its relation to the power func¬ 
tions of other possible tests of the same hypothesis. 

Tn this paper we shall deal u itli tests particulaily sensitive to small deviations 
of v from its hypothetical value v = 0. In this respect, our first trial is to find 
a region u>o , belonging to the family P(t) and satisfying the condition 


(70) 


| Wg) 

dv 


> 


Kc=0 


d0(v | W) 

dv 




l 


where w is any other region belonging to the same family F(e). 

Because of the peculiar structure of the regions belonging to F(e), the problem 
is immediately reduced to finding regions Wo(ip). According to theory explained 
elsewhere [18] these should satisfy the condition 


(77) 


dp(E 17ip, v ) 
dv 



> UT)p(E\H), 


where k(T) depends on 7\ , T 2 , ■ • • , 7Y+i only and is determined to satisfy the 
condition of similarity (00). Condition (77) is equivalent to 


(78) 


6 log p(E | h , v) 


dv 




> k(T). 


Taking the logarithm of (68), differentiating with respect to v, putting v equal 
to zero, substituting in (78), and combining all the terms which are constant 
on W(tp) into a single term which we may write as lhlk\(T), we have 

(79) E (Sf + ( T i+l - £,) 2 ) 2 > h(T). 

9 1*«1 


We note that condition (79) determining, so to speak, the shape of the region 
w 0 (</>) does not imply any restriction on the variability of the z's but only on 
the u’s. However, the region Wo (<p) as determined by (79) has the disadvantage 
of being dependent on the values of the f,. Since these are not specified by 
the hypothesis tested, we are not able to determine the critical regions belonging 
to the family F(t) and maximizing the derivative <3/3(v | w)/dv] ym ^ . The region 
which docs so for some particular system , • • ■ , t-n of values of the £’s 

will lose this property if the system of values of the £'s is appropriately changed. 
Therefore, our choice of the region maximizing the derivative of the power func¬ 
tion at v = 0 should be made not from the whole family F(e) but from a sub¬ 
family composed only of such regions which also possess the supplementary 
property that 

(80) —= constant 

dv Jk-o 

has a value independent of £i, The determination of this sub¬ 

family F%(e) embracing all such regions is an interesting problem. Until it is 



68 


S. SHMW 


solved, wo use an obvious .-ubfmuily /■",,)fi nf uumm »• whuh Inn «],«■ .jb-.m-d 
propeity, but wo do not know whether or not F -ts rosiintu- nil <nrh r.giMi,,' 
The family I'\(e)Js defined a** roii-i-ting of i!i*—»• legion,*. b<d"mjis.v »<< f ,» 
which could lie described us cylindrical vvifh (la ir g» m-infm. pjiudb'J !*, Me 
intersection of T, (1 = x, * constant. I'm ? 1, ‘J, ... ,,V In „tli,. T ,t*. n b 

and more precisely, a region w of the family F.i\ Is-imigs to F >*j jf t|„, 
question of its including a given point K di-jwnd- on ,V» V >4 n, . ,«>trdn, iP-, 
namely oil 1\ , Ui, • ■ • , i, ?,,!. • • • , Z\.+ 3 and not ««h 7’?, i\ , ... , 7\ t , 
We easily show that any region w Imlonging to F-u } ti (f , pr<.j« rf» 

that its power function is independent of the ]) f note by &' the *,*■» of ■*%.« 

terns of values of 1 1 , ui, * •. , u,v t, Cu , • • • , r\„*, s corresponding to points 
included in any given region ui of the family Fjto. We see that the junior 
function p(v | w ), equal to the integral of HiH) mer ir. eau be rnlcuhtb d in udiig 
the transformations (47), (4(1), and t'db Then the region of itttegt:if i»n f,.i 
Ti, v,i, - • ■ , un-i , z i.i, • * • , 2 , v ,„ 2 is what \\e have just deuott-d by tr* and the 
integrations for T %u - extend from ~ f to ■+ x u f the bvd 

values of the, other variables, These integration-, are easily rained out b\ sub¬ 
stituting 

( 81 ) - £,) 5 * (1 + ^FvS;^ , 

Tlie final result is 


(82) 0(y|u>) = f ... f p (7 

J J w' 

Here 


t. »i, 


W.V I . 2| i , 


VV.ts 


s) <n\ 


(83) 


P(Tl ,Ui, ... t 2 U t . . . g V n . 


— e(r)‘h( /!, a,*;) / \\ (1 + )uhnvS 

/ ,*»j 


.4,1'" C'lf 


where c(v) denotes a constant depending on r, M\ , u, z) denotes a function of 

W f h } r na C 'l inV ° IVed ' f,r * «“l denotes expressions 

(47) for shoit. We see that (82) is independent of the £,V. 

sern^Mv! regi °n u t0 F(e) ’ ifc » of sections »e lv> ) selected 

sepaiately ou each hyporsurface T\ - constant and r„, - constant, » ■ 1, 

w eM ;L!rrv r tl " “ ,iti,,n .t «* '“«>• r.(o, th, 

naependent of 7 2 U,-. ; ,T Nu m that each of them can be selected mdv 

SrL™ t ‘™v“ r< ■ ***** «• »“v ... .. . 

Ai far m pioporty (80) is concerned, tin clinic,, i« nrliilrnrv. Ilul tin- rmurriv 
!!qy *“ conditi.™ ((,0, which, in tl,„ , 


dip -.. dr, 


,Y,n 5 ~ t 


(84) ' /..r.y*” 1 ' • • ■ *'■>.• ■ ■ in,... r„„) 

“ "W « I" ftCI «i... PrnlmMj 
CS from one of the ^P 0 ' 18 of f'dd by a act of me Mure »ro only. 



A STATISTICAL PROBLEM 


69 


Applying the method already used, we find that sections w(Ti) of the region w 
belonging to F 2 (e) and maximizing the derivative 3/3 (r | w)/dv] r ^ 0 are determined, 
separately for each value of I\ , by the. inequality 


(85) 


3 log p{'i \, Ui, 


lltf-l 3 Z 1,1 , 

hv 




l/erj) 


> UTi) 


where ki{Ti) denotes a function of 7\ determined to satisfy (84). 
Substituting (83) in (85) we easily find that this condition is equivalent to 


( 86 ) 




> hd\) 


where, again, k 3 (Ti) is determined for each particular value of 1\ to satisfy (84). 
As (86) does not imply any restrictions on the variability of Z\,\, Z\, 2 , ■ • • , Zv, n ~ 2 , 
the integrations for the z’a while calculating (84) must )x: carried out over the 
extreme limits (52). This will reduce the integrand to the relative probability 
law of iii , ii 2 , • ■ • , u N ~\ given all the T’ s. This law is easily calculated from 
(58) and is 


p(wi, Ui, • •• u N -i \ Ti, Ti, 


(87) 


• 7V+i) 

- r(W "4f'((i-§«.)nA,) 


|(n—8) 


r*(Kw 

= p(ui , Ui, 


• Uy-i) 


As (87) is independent of r I\ , T-,, • • • , T N + 1 , it is also the absolute probability 
law of the ids and hence k 3 {Ti) is independent of r l\ In accordance with the 
notation adopted for the left side of (86), namely t, and since the choice of 
k 3 (7\) depends on e, n, and N, we may use instead of k s (7\). Then the region 
w is determined by the inequality 

N—l / IV—l 

(88) r = ]g u\ + (1 - g in) > r« 

or, returning to the original variables, by the inequality 

(so) r - it si / (It si)* >-r« 

t«»l / \*“«1 / 


where is the root of the equation 


(90) 


r (Win -J)) f f (( 

r»(*(» - I )) j j W 


tf-l \ fir-1 

23 'Wi) XI u i) dui ■ ■ ■ du N - 1 = e 

t<=al / / 


This region u> has the following property; of all the regions belonging to the 
family the derivative of the power function of w at the point v = 0 is the 
greatest. Thus, as far as the values of v close to zero are concerned, we may 
say that, for testing H, w is the most powerful critical region in the family F*(e). 



70 


J. NEYMAN 


8, Methods of determining s \ . To calculate jT, accurately we mih( calculate 
the integral probability law of f, that is to say, 

(91) P(f <*} = /'■■fp(ux, ■ ■ * U,v..]l -«■ ifir.v,.! 

r<« 

for any z. The author was not able to achieve this. Therefore Mime methods 
of approximation had to be looked for. This tusk becomes Mimcwlmt simplified 
by noting that in most practical problems N will he very large, in t hi* hundreds 
or thousands, while n will probably not exceed 5. 

To start, we notice that the range of f is limited by 

( 92 ) l/N < t < 1 . 

The easiest way to sec this is to look for maxima and minima of the sum 

(93) X = £ Sj 
subject to the restriction that 

( 94 ) . £ = Ti 

We then easily find that 


(95) T\/N < X < T\ 
and (92) follows diicctly. 

Since f is a polynomial of the second order in the u'x, we may consider its 

H 

moments. These will be functions of the expectations of the products jt uf’ 

S~1 **■> 

where, for short, % = 1 — 2 u < • Using (87) we easily find that 

i“l 

(96) e( IT = _ r(Wn- 1)) A r(j(n - 1 ) + k t ) 

V '" 1 r^-D + gb)^ mn-D)- 

In particular, if we let (« — l)/2 = a 


(97) 

(98) 

(99) 


EM) 


a(a + I) 
Na(Na + 1) 


EU) = q(a + i)(g + 2)(g + 3 ) 

Na(Na + l)(Na + 2)(Na + 3) 

E(uW,) = —-_ ^ + D‘ 

Na(Na + l)(IVo + 2)(AT a + 3) * 



A STATISTICAL PROBLEM 


71 


Consequently and because r = 53 > WC! have 


<“i 


(100) B(S) - ni = (a + l)/<JVa + 1) 


( 101 ) 


m 2 ) = & = 53 e(u\) + 2 i: 53 e(uW,) 

t^l ;•=>» 1*1 


(a + 1) (a + 2) (a -j- 3) 


+ y>r 


(N - l) a (o + l) 2 


(Na + 1 )(Na + 2 )(Na + 3) (.Na + 1 )(Na + 2 )(Na + 3 )' 


The variance trf of f is therefore 


( 102 ) 


2a(a + 1)(N - 1) 


<7r (Na + l) 2 (Na + 2)(Na + _ 3)' 
By a similar procedure we find that 


(103) 


(104) 


E( f 3 ) - ni 


Etf) = mI 


(a + l)(a, + 2 )(a + 3) (a 4) (a + 5) 

4* 3 (N — l)ft(o 4- l)‘(a + 2)(a + 3) 

-4 (N - 1)(N - 2 )a\a + l) a 
(Na + 1 )(Na + 2 )~(Na + 3 )(Na + 4)(iVa> 5) 

7 S 

IT (° + j) ■+■ 4(iV — l)a(a + 1) H (n + j ) 

,-i ,-i 

+ 3 (N - l)a XI (a + j) 2 

+ 6(tf - l)(iV - 2)a 2 (a + l) 3 (a + 2) (a + 3) 
_ + (N — 1 ) (N -2 )(N~ 3 ) a 8 (o + l) 4 

II (Na + j) 

j-i 


One possible method of approximating f, is to use the formulae above, together 
with the higher moments whose formulae are easy to deduce. Some convenient 
known distribution, 9ay pa(f), could be fitted to have its first two or three mo¬ 
ments coincide with those of the unknown true distribution of f. We would 
then look for better approximations by means of the functions 

(105) PmO") = Pc(f) 53 Ar*i 

y-i 


where the ir,*s denote polynomials which are orthogonal and normal with respect 
to po(0 so that 

r f 1 if j = k 

(106) / 7tjt apo(f)^£ = | 

J [0 if j ^ k. 

The constant coefficients A,- are formed to minimize the integral 

(107) / ( W ) - pfc(f) 53 dr- 

They are expressible in terms of the known moments of p(r). 



72 


J. NEYMAN 


This is one possible way to approximate yip") which would eventually lead to 
the computation of f, even for small values of A r . 

Remembenng that we are concerned with large -Vs, wo can prove that tlsr* 
normalized distribution of f, that is, the distribution of 

(108) 

vr 

tends to be normal as N —» w. However, the process of tending to the limit 
is rather slow as may be seen from the following table of K. PearsonV unrl dj. 

TABLE II 


Frequency constants of (he distribution of £ 


n 

N 

t 

Ml 


ft | 

ft 

3 

100 

.0198 

.001922 

.8052 * 

5.012 

3 

200 

.0099 

.000693 

.1018 \ 

4.211 

3 

400 

.0050 

.000248 

.2110 j 

8.587 


Because of this and also because the proof that the distribution of (ll)K> tends 
to normality is not very straightforward, we shall not reproduce it. Hut if may 
be well to point out that the cause of this slowness in tending to the limit lies 
in the skewness of the distribution of each particular u, and in the mutual 
dependency of all the ui s. 

The most promising method seems to be the following. Find consider the 
two sums 

(109) Ti = E S! and To = Y, Si 

, V-i ;~l 

Obviously, these two sums satisfy the conditions of the limiting theorem of 
S Bernstein [29], [30] and, therefore, as JV —> «, their joint normalized distri¬ 
bution tends to a normal surface Also, wo may expect the process of tending 
to the limit to be rapid in this case. If p(To, 7'i) denotes the limiting normal 
distribution, the probability that f > z can be approximately calculated by the 
integral 

( 110 ) P[f > z] = P{T 0 > »T\) = dl\ [] p(T 0l T,)dT 

J—oO **T] 

To calculate the limiting distribution p(T 0 , 7\) we need only the expectations, 
say A and B, of T i and To respectively, their standard errors, say a, ami <r,, 
andtheircorrelation coefficient R. These may be obtained from the moments 
of the Si’s. 

Formula (110) can be used not only for tabulating the integral probability 
aw oi t and for determining f,, but also for an approximate calculation of the 
power function of the test. For, if the limiting probability law p(T 0 , T x ) is 



A STATISTICAL PROBLEM 


73 


calculated using the moments of jS< calculated from (70) with some v > 0, then 
the integral (110) calculated with z = f, gives us the probability P {f > | v} 

of the test detecting the falsehood of the hypothesis tested, that is, the power 
function. 

To save space, we shall now calculate the constants A, B, o-j, <r 2 , and 72 as 
functions of v > 0, The values appropriate to the case when the hypothesis' 
tested is true will then be obtained from the general formulae by the mere 
substitution of v = 0. 

Since all the constants above depend on the expectations of S\ k , we use formula 
(70) to calculate them. Denoting the expectation of Si 1 " by /x fe , we have 


( 111 ) 


2(«/i<u'/2) itn_1> r s ik+n ~ 

B(1 A, Un - 1)) Jo (1 + tnhovSY n - li+1,, ‘ 


Introducing the new variable 


( 112 ) 


1 + $nhovS 2 — t 1 


makes the integration straightforward and gives 

/_2_ Y r((l/y) - k)r(Hn - 1) + k) 

\nh 0 v/ V(l/u)V(i(n - 1)) 


(113) 




This formula holds good if 1/V > k, Otherwise the /eth moment hk is divergent. 
So this approximate method of calculating the power function of the test is 
applicable only for v < 25. 

Substituting k = 1, 2, 3, 4 in (113), we have 


(114) 


_ 1 n - 1 

fil nho 1 — v 

_ / 1 V n 2 — 1 

H \nhj (1 - v)(l - 2v) 

/IV (n 2 -l)(n + 3) 

^ \nhoJ (1 - v)(l - 2>0(1 - 3i») 

/l V (n 2 - l)(w + 3)(n + 5) 

^ \nhj (1 - 0(1 - 2^)(1 - 3k)(l - 4r)’ 


and now we have 


(115) 

(116) 
(117) 


, N n - 1, N n z — 1 

nho 1 - V ’ (nX 0 ) s (1 - r)(I - 2.) ’ 

2 _ TV (n — 1)(2 + *(n - 3)) 

1 (nho)* (1 - „) a (l - %>) ’ 

2 _ 27/ (n 2 - l)(2 + r(n - 3))(2(n + 2) - r(5n + 7)) 

2 {nhoY (1 - r) a (l - 2v) 2 (l - 3v)(l - 4x) 

,2 = 2(n + 1)(1 - 2y)(l - 4r) 

(2(n + 2) - v(5n + 7))(1 - 3v) ‘ 


( 118 ) 



74 


J. nkymah 


Inspecting formulae (115) to (118) makes us see that there is an advantage 
in substituting two new variables 

j __ Vlho rf • (llhct) ,,, 

h-\rK. -Till. *r/..S 


(119) 


( 120 ) 


(Si 


1_ 
i - v s 


a. 


N{n~ l)‘" ** xV(n*-l) 

for Ti and T 0 . Their expectations, say t9 t and t? s , are 

1 

(!-e)(I-Im¬ 
probably without any danger of confusion, the S.K.'a of U and l s may bn de¬ 
noted by Ui and a 2 also and we shall have 

2 2 -f v{n — 3) 

0-1 ~ WV n~. 

( 121 ) 

CTj = 


1 N{n~ 1)(1- v) J (l - 2*-)’ 

j = 2(2 + v(n - 3))(2(a + 2) - v(5n + 7)) 

N(n* - 1)(1 - 2x) 5 (l - 3i>)(V*- 4ef" * 

Of course, the correlation coefficient of t, and ij is the same as that of T, and T #, 
namely ft. Obviously, the inequality T 0 > zT\ is equivalent to k > Zyl\ pro¬ 


vided that 


( 122 ) 


z *= Zi 


n + 1 


N(ii — 1)‘ 

Now the problem of calculating (110) ia reduced to finding 

PH > Z) = P(<, > Zy&\ 


1 


(123) 


2nra\ <rj y /1 — ft* 




(h ~ *0* 




_* 


— 2ft < lQ(h — d 2 ) ^ (fj 


th) 


VlVl 




=}] 


dl i dh 


We may conveniently see the workings of the test proposed by considering for¬ 
mula (123), First consider the case when the hypothesis tested is true. Both 
f 1 * , ^ reduce to unity. The region of highest frequency is around the point 
\ .j* ; If N 18 lar S e then both ay and a s arc small so that the region of 

1 f, gmficant frequency is rather small The integral (123) is to be taken over 
the region above the parabola k - zdx passing through the origin of coordinate. 
W S “ a l &nd th . e parabola P^es far below the point (,•>{,«*], the 

hfl°i a ij i ^ 2 W !|! j 6 close to un ^ y ' When z t ■» 1 this probability will 

be less than J and it will dimmish rapidly with further increases of z, , Now 
suppose that we have found the value f, for which P[f > f.! * « 01 « < and 

7“ ^ p aPPeQ t0 (123) Whea * = f- * > * bcwSed. Clearly, neither 
of vx and <r 2 nor ft are very sensitive to slight changes in v. Also 0, will not 
ge very much. On the other hand, t? 2 will increase rather fast. The final 



A STATISTICAL PROBLEM 


75 


conclusion is that the whole frequency surface corresponding to the integrand in 
(123) will not change shape much but will shift to bring a greater amount of 
frequency into the region of integration. 

To facilitate numerical calculations introduce 


(124) 



y = 


h — t?i)/Vi 


Now (123) may be rewritten as 


(125) 

where 


P{r ;m - 





/ 2*7T *4/ 


*\/ 2r 



dx 


(126) y(x, Zl ) = , 

(r 2 Vl-P 2 

Using formulae (125), (126) and (119) to (122), the following numerical 
values were obtained. 

TABLE III 

n = 3, N = 100, r = 0. 


*1 

p{r £ z|>- - 0) 

.8 

.9126 

.9 

.7305 

1.0 

.4905 

1.1 

.2847 

1.2 

.1495 

1.3 

.0730 

1.4 

.0335 

1.5 

.0148 

1.6 

.00644 

1.7 

.00288 

1.34450 

.05000 

1.54563 

.01000 


TABLE IV 


Power of the lest for n — 3 and JV = 100. 


< 

r. 

V - ,01 

r - .16 

.05 

.02689 i 

.05823 

.37482 

.01 

.03091 j 

.01234 

. 10699 


The figures above are only approximate and we realize that the greater the 
value of v the less satisfactory is the approximation of the power function. A 
check of the goodness of the approximation and, if it proves satisfactory, a few 









76 


J. NEYMAN 


numerical tables for practical applications of the test muat Iks {w^tpoinnl to 
another publication, 

It is a pleasure to record the author’s indebtedness to KlistalHh Krott 
and also to Miss Julia Bowman for carrying out all the numerical work con¬ 
nected with the present paper, 


REFERENCES 

[1] J, Neyman, Jour, Roy . Slat, Sac,, Vol. 97 (1991), j». 558 

[2] J, Neyman, Phil, Trans, Roy. Sac London, Vol. 23-6-A (1937), p, 333. 

[3] “Student”, Biomelrika, VoL 19, (1927), p. 151. 

[4] J. Przyborowski, Rocznih Nauk Rolniczych i Leinych, Vol 30, (1933), p. 1. 

[6] J. Neyman; Lectures and conferences on mathematical statistics, Washington, 1>. 
(1937). 

[6] J Berkson, Jour. Am Slat. Assn, Vol. 33, (1938), p. 520, 

[7] J Berkson, Jour. Am. Stal, Assn, Vol. 35, (1910), p. 362, 

[8] J. Neyman, Annals of Math. Slat , Vol. 11 (1940), p. 478, 

[9] J, Neyman and E. S. Pearson, Bull. Int. Acad. Polon. Set, Cnmvic, (A) Vol. <1 (1031 j, 

p. 460. 

[10] B. L. Welch, Biometnka, Vol, 27 (1935), p, 145, 

IH] B. L. Welch, Slat. Res. Memoirs , Vol. 1 (1936), p. 52. 

[12] U. S. Nair, Slat. Res. Memoirs, Vol. 1 (1936), p. 38, 

[13] M. S. Bartlett, Proc. Roy Sac , Vol. 100-A (1937), p. 268. 

[14] S. S Wilks and C. M. Thompson, Biomelrika, Vol. 29 (1937), p, 124. 

[15] D. T. Bishop and U S. Nair ( Jour. Roy. Slat. Soc. SuppL, Vol. 6 (1939), p. 89. 

[16] E. J. G. Pitman, Biomelrika, Vol. 31, (1939), p. 200, 

[17] H. O Hartley, Biomelrika , 31 (1940), p. 249. 

[18] J Neyman and E. S, Pearson, Phil Tram Roy Soc. London, 231-A (1933), p. 289, 

[19] J. Neyman and E. S Pearson, Slat. Res. Memoirs, Vol. 1 (1930), p, 1. 

[20] J, Neyman, Bull. Soc Math , Prance, Vol. 03 (1935), p, 2*10. 

[21] H. Hotelling, Am. Math. Stal„ Vol. 11 (1940), p. 271. 

[22] W, Feller, Slat Res. Memoirs, Vol. 2 (1938), p, 117. 

[23] H Cram6r and H. Wold, Jour. London Math . Soc. , Vol. u (1930), p, 291. 

[24] K. Pearson, Tables of the Incomplete Bela Function, Riomctrika Oflicc, University 

College, London, 1934 

[25] R. T. Birge, Astrophysical Journal, Vol. 39 (1914), p. 50. 

[20] R, T'Birge, Physical Review, Vol, 40 (1932), p, 2Q7, 

[27] A. A. Michelson, F, G. Pease, and F. Pearson, AsiropDpaicol Journal, Vol. 82 (1935), 

p 26 

[28] J. Neyman and E, S Pearson, Stat Res Memoirs, Vol. 2 (15)38), p, 25. 

[29] S. Bernstein, Math. Ana., Vol, 97 (1026), p. 44. 

[30] W Kozakiewicz, Ann. So c, Polon. Math., Vol, 13 (35)34), p, 24. 



A CONCISE ANALYSIS OF CERTAIN ALGEBRAIC FORMS 

By Franklin E. Satterthwaite 
State University of Iowa, Iowa City, Iowa 

Many of the statistics in common use arc functions of homogeneous algebraic 
forms in the items of the sample. Among such statistics are the mean, a linear 
form; the variance, a quadratic form; and the product moment, a bilinear form. 
With the extension of the science, the mathematical statistician is faced with 
the study of more complex statistics and the associated algebraic forms and 
matrices The purpose of this paper is to set forth concise and efficient nota¬ 
tions and methods which may be used in such analysis. 

We shall borrow the essential features of our notation from differential geom¬ 
etry and tensor analysis. The Kroneker delta is defined as, 

5 , ~ 1 , i — j, 

— 0, iV j. 

The summation convention provides that summation be performed with respect 
to any index appearing twice in the same term. Thus, 

xa/ = xiU 1 + W + * • • . 

To extend the use of the summation convention, we shall frequently place 
indices on the numeral, 1. Thus, 

1 I, = l*3!i -{- t 2 Xi -(- • • • — Xi -f- Xj -(- * • • . 

Symmetry in the calculations is more striking if the pair of summation indices 
appears, one as a superscript, the other as a subscript. Therefore wc allow the 
shifting of an index from the one position to the other at will. Thus, 

Xi = x\ 

Where no confusion will arise, indices may be placed outside of parentheses. 



The standard notations for averages will be used. 



77 



78 


FRANKLIN E. BATTEKTHWAITE 


Unless otherwise indicated, the symbol, X, will always si anti for summation 
over all unrepeated indices including any already averaged under conventions 
(1) and (2). Thus, 

2i J * N£\ 


The following simple formulas are fundamental to the arithmetic of this 
notation. They are obvious upon the expansion of the summations* Knell 
index varies from 1 to a. These formulas are 

SiX, = Si, 

liif = 5 {, 

2} 1/ = 1* , 
lU* - alt, 

5 i 

Oi «= a, 



The symbols of this notation obey the associative, commutative, and the 
distributive laws of simple arithmetic so that the operations of summation, 
multiplication, and squaring are very easy. Thus for the product of two linear 
forms we have 


n = (0 “ (a))**'- 


The sum of squares is obtained by the simple repetition of the form, 

(3) = 2(5^,)* = $*,)($'*/), 

= « 6lxfX k . 

Two other sums of squares occur so frequently that they should be particularly 



(4) 



ALGEBRAIC FORMS 


79 


(5) 


S(x, - 





The striking similarity in the coefficients of the second and final expressions for 
the summations in (3), (4), and (6) should not be overlooked. 

Where we have multiple classification of the variables, we may operate on 
each index separately. For example, in a four-way analysis of variance we may 
have the quadratic form, 


Q = 2(£,yjt. — ${j .. — x { . k . St. .} 2 , 



The rank is one of the important properties of a quadratic form or matrix. 
An experienced mathematician usually has a rule of thumb for determining the 
ranks of those quadratic forms occurring in statistical analysis. In order to 
formulate such rules of thumb into a simple and rigorous algebra, the author 
here defines a type of matrix multiplication which he calls “uncontracted matrix 
multiplication” and which he represents by the symbol, O. 


Let A = || a< || and B = || $ || be two matrices of any finite orders and with 
ranks R A and R a . Wc define the uncontracted product, A 0 B, as follows: 


C = AQB 
- !i«:ii©s 

= ii«uh 

a\B a\B ••• 

= ct.B o.B • * > 


I 



80 


MlANKIrlN E.-SATTERTH WAITE 


where 




'b 


j] a'i{3t *\t& 

ii. 


Thus the element a of C are 



We therefore see that whenever we have a matrix whose elements can be 
factored in the above manner, then the matrix can W cxpwnwd an the unrtm- 
tracted product of simple matrices. Thus, 


if ||7?i"*"ll 3 II W A* "OJl 

then II - ||«ri|0||/9?J|0 .... 


We shall now prove that the rank of the uneontraeted product, ( ’* A © B, 
of two matrices is equal to the product of the ranks, This follow* because for 
the matrix, A, there always exists a set of elementary transformations defined 
by the equations, 


Ta 



&\, 0{ 5* 0, i j, 


where the % = j, are coefficients providing for the multiplication of the de¬ 
ments of a row by a constant not zero; the ^’s, i ^ j, are coefficients providing 
for the addition to the elements of a row a linear function of the corresponding 
elements of the other rows; the 0’a arc similar coefficients referring to columns; 

the symbol ^ is an operator indicating the interchange of the, ith and jth 

rows (columns); and the have the values, 

= 1, i ™ j Ra i 


= 0 , 


otherwise. 


This set of transformations reduces A to a diagonal matrix with R x non-zero 
elements. A similar set of transformations, 



exists for the matrix B. We next define two sets of transformations by the 
equations, 




ALGEBRAIC FORMS 


81 


which are also elementary because of their relationship to T A and T D . Now 
if we subject the matrix, C = || (alfil) Jj to the transformations T' a followed by 
the transformations T' B , it will be reduced to the diagonal form C = || ( A SiiiSl) || 
with exactly R A Ra non-zero elements. Therefore, since the rank of a matrix is 
invariant under elementary transformations, the rank of C = A O B must 
be . 

We shall now determine the ranks of several matrices which occur frequently 
in statistics: 


Ai = || 1,|| =» 111, 1, 1, •••II, R, = 1. 
A* = ||li|| = P.-l'II = || 1,-H O 1| 1' ||, 
r 2 = 1-1 = i. 



The proof that R t = a — 1 involves two steps. First summing the rows of At 
we have, 



so that R a < a — 1. Second if we subtract the elements of the first row from 
the corresponding elements of each of the other rows we obtain, 


Ai 


, 11 1 

1-| 

_a |_a 

-1 | »i 


i — 1 
i * 1. 


Since the (a — l)st order determinant in the lower right-hand corner is not 
equal to zero, Rt > a ~ 1. 

Applying our theorem on uncontracted products, the ranks of complicated 
matrices can often be determined by inspection. Thus: 


As = 


Rt - a-(b- 1). 

” 1 (* - i)«(* - iXH’ 

Z2o = (a — 1)(6 - 1). 


Ai = 


(■ 5 - i)X‘ ~ d’ M ' 1= I K s - ;)' »■][(* -;). y ‘l II ■ 


R 7 = 1.1 = 1. 



82 


FRANKLIN E. SATTERTHWAITE 


The Matrix A 7 may be confusing at first sight. Note that each element, a\ , 
is a quadratic form in the y’s. This form is of rank 1 and can be factored into 
two linear factors, one independent of j, the other independent of i. 

To illustrate the application of these techniques to a fairly complicated prob¬ 
lem, we shall construct and verify a design for the analysis of variance, involving 
a regression line. It is known that sufficient conditions for such a design to 
be valid are: 

1. The sum of the quadratic forms be equal to the aum of the squares of the 
variables, and 

2. The sum of the ranks of the forms bo equal to the number of variables. 
We shall use the first condition to set up our design. Thus, 


( 6 ) 


2x), = [««]?,' 

■{[“-‘s 


-& + 
a 


+ 


[i--ito 1 n:.-- (axd:] 

Rewriting this in the usual notation, we have for our tenative design, 


7 a:*!*". 


(7) 


Zx’ y = 2[xa - - £./ + if + Z[J] J + SfcZ.y - £? 

+ 2[(ro x /o») (j/( — §)f + 2 [(£,•. — £) — (,ra x /a v ) (y, — #)]*. 


In order to determine the corresponding equation for the ranks, we rewrite (6) 
in the form, 


(8) 


■*-{(■- iXO - 1 ) + (SO! ♦ ® - EX 
d(-M(-3'-Ka)8): 


+ 




xt M x 


First we must determine the rank of the unfamiliar matrix, 

* “ ll(* “ a). " (* - tj.(‘ - 

We see that the rank of As cannot be greater than a — 2 because two linear 
relations exist between the rows, namely, 


l*a| — 0, since 
V*a{ = 0, since 




ALGEBRAIC FORMS 


83 


To show that the rank of At cannot be less than a - 2, we subtract the elements 
of the first row from the corresponding elements of each of the last a - 2 rows, 
giving, 


2 ) 

_^ _«<_ i -\2 

“ (* “ f) !/*(*! ” t[)yi (fi -rtl - Sl)i/t 

- v — fl/ - s --^1,2 


i i 
ct{ ~ ai 


Multiplying each element of the second column by ~ y\j ~ ^ y‘ 

and adding the result to the corresponding element of the jth column for j = 3, 
4, ■ • < a, we see that the (a - 2)th order determinant in the lower right-hand 
comer becomes 15’, | which is not equal to zero. Therefore the rank of At must 
equal a - 2. 

Referring to equation (8), we now write down the corresponding equation for 
ranks using the theorem on uncontracted products. Thus, 

2 Ranks - (a - 1)(6 - 1) + (1)(1) + (l)(b - 1) + (1)(1)(1) + (a - 2)(1), 

= ah. 


Hence the quadratic forms in the right member of equation (7) are mutually 
independent and each, measured in units of the variance of the population, is 
distributed as is Chi-square with the appropriate number of degrees of freedom. 



A SYMMETRIC METHOD OF OBTAINING UNBIASED ESTIMATES 
AND EXPECTED VALUES 


By Paul. L Dresskl 

Michigan State College, East Lansing, Michigan 


The problem of finding the relationship between moment functions of a 
sample and moment functions of the population from which the sample was 
obtained has, of necessity, received much attention. The problem has two 
parts, first, to find the expected value of a given sample moment function; 
second, to find the estimate of a given population moment function. Thus, if 
to, represent the ith. central moment of a sample and a. represent the hit central 
moment of the population, the first part of the problem requires that we find 
the mean value of to, for all possible samples of a given size and express it in 
term of the /n’s The second pari requires that we find a function of the to,’s 
such that the mean value, taken for all possible samples of a given size, he a 
given fi ,. For the case i - 4 we have the well known results: 

E [ mi ] = 7 1 ^ 2 ' 1 ” 


r tf , n(n l - 2n + 3) 3n*(2n - 3) 2 

E y =-nir~ rti\, 






These results are based on the assumption of an infinite population. In spite 
of the inverse relationship existing between estimates and expected value, the 
expressions above show no simple relationship. This lack of simplicity of rela¬ 
tionship between estimate and expected value is directly traceable to the fact 
that such results are usually obtained for infinite populations. When results 
are obtained for finite populations a symmetry is found to exist which reduces 
to a single problem the two parts stated above. Since this should lie evident 
to anyone upon reflection, the main purpose of the, present paper may be con¬ 
sidered as that of indicating one method of demonstrating the result staled 
abov.e as well as showing relationship of this method to material appearing in 
previously published papers 

Consider a finite population consisting of N items Xi • •• X# and samples of n 
items taken from that population, the sampling being done without replacement. 
We shall utilize the power product notation of P, S, Dwyer [1; p. 13] 

(1) (h--ffr)= t «•••< 



OBTAINING UNBIASED ESTIMATES AND EXPECTED VALUES 


85 


to represent a power product formed for the sample and 


(2) [qi • ■ • q,] ~ 1 l • • • xV r 

to represent like power products formed for the population. An arbitrary 
moment function of weight r of the sample is indicated by 

(3) Sa a n .. a y< ... (^1)**^! ... Vl \ ** * *' ^ 
and likewise a moment function of the population is indicated by 

(4) ZA q ;» . [qi]Tl '' * [g ‘ ,r ‘ 

where the summation extends over all partitions of r. 

It now is convenient to express each of the expressions (3) and (4) in terms 
of power products. We shall utilize for this purpose an .expansion theorem 
which is the converse of a theorem stated by Dwyer, [1; p. 34] and [2; pp. 37-39], 
which can be proved in a similar fashion. 

This converse theorem follows: 

If any tsobaric sum of products of power sums indicated by 

(5) 2A a u. e t , ; - (^i)*«,jy.:~! for for • * • for 

be expanded in terms of power products in a form indicated by 

(6) 2£ P? .. _ (p,!)'*^!... ttJ [ pr • ’ ‘ P*" 1 

then the coefficient B r of the power sum [r] is given by 

(7) Br = s ( piO'm--- 

and the coefficient B ri ... rm of [r^ • • • r m ] is 


( 8 ) — B ri B rt • • ■ B r „ 

where the barred product indicates a symbolic multiplication by suffixing of sub¬ 
scripts. 

This is exemplified by 

-Z?a2 = BsB$ — (Aa -f- 3A 21 + Am)(A2 + An) 

= As 2 + Aau + 3A221 -j- 4A2111 + A 11111 ' 

Using this theorem the moment functions (3) and (4) are easily expanded in 
terms of power products. In this latter form the expected value of the sample 
moment function is easily found by utilizing the fact that 

e ( fo •' 1 g-A = fo ’ ’ ‘ g J 

\ «« / JVW * 



86 


PAUL h. DHKfiSKL 


Now if the expected value of the sample moment function be equated to the 
population moment function (both being in power product form) we obtain a 
set of equations connecting the coefficients of a sample moment function and a 
population moment function. Since either the coefficients of the sample mo¬ 
ment function or those of the, population moment function may be assigned 
and the others solved for, this set of equations enables one to solve two problems, 
tirst, we may find unbiased estimates—moment functions of the sample such 
that their expected value is some preassigned population moment function. 
Second, we may find expected values—moment functions of the population such 
that they are expected values of some preassigned sample moment function. 
From the symmetry of this set of equations, we shall set' that any result ob¬ 
tained from the system has, through the symmetry, a dual role. 

The foregoing discussion rnay be clarified by an example. Let Aj[ 2 ] + d. u [l ] 2 
be the population moment function. In terms of power products this becomes 
(At -+• An)[2] + A u [ll], The sample moment function a*(2) + ou(l ) 2 becomes 
in terms of power products (05 -j- an)( 2 ) + au(ll) and its expected value is 

(«2 + au)[ 2 ] + au[ll]. 


By equating this to the population, moment function above wc obtain 

n w a n = N m A n , 

n(a* + an) = N(A» + An), 

and the symmetry of the system is apparent. 

If 


,co 


Pi = 


T, — 


N 


CO 




1 

I 

Pi 


(9) 


#<•>’ 

the solutions of the system are 

an = TtAu , An = pian > 

at = tiAj + (n — r a )An , At — pia? •+■ (pi — ps)an . 

In a similar manner if we use moment functions of weight 3 we begin with 
Aa[3] -J- 3An[2][l] 4 - Am[lf, 
a 3(3) + 3aai(2)(l) -f- am(l) a , 
and obtain the system of equations 

n t3, a m = N (!) A l 


ini 


n 


CD 


((hi d - ffliu) — 2V^(Aii -f- Am) 


n(a 3 + 3asi + a m ) = N(A» + 3A S1 + Am) 



OBTAINING UNBIASED ESTIMATES AND EXPECTED VALUES 


87 


with solutions * 

-4-m = m i 

(10) -4.21 — Pi&ni 4- (p» — pa)flm » 

4a = piOa + 3(p t ~ pa)aai -f (pi — 3pa + 2p s )am ■ 

The solutions for the a’s in terms of the 4’s are obtainable from the given results 
in an obvious manner. 

If we use the Carver functions [3; p. 104] 

Pi = Pi f Pn — Pt ■ ■ ■ Pi* = pi i 

Pi — Pi — P 2 , P ii — pa — pa • ■ ■ 

Pa ~ pi — 3pa -f- 2pa, P 22 = p 2 — 2p s 4" Pi • ■ • 

Pi — Pi — 7pa + 12pa — 6p 4 , 


or in general 

(ii) p r « tp.zt-ir 1 

t-i 

and 


rl(< — 1)1 


(pi!)' 1 ••• (p.O'Vd ... *.1 




Pr.Pr, • • • Pr. 


where the double barred product indicates a symbolic multiplication by addi¬ 
tion of subscripts exemplified by 


Pai — PtPi = (pi — 3ps -)- 2pa)(pi — pn) 

— Pa — 4pa + 5p4 — 2p 6 ; 
the results (9) and (10) may be written 

An = PnOu , As = Pids -f- 3P iChl + P3^111 , 

4,2 — Pid 2 + P 2 O 11 , 4si = PnQpi + P 21 O 111 j 

4 in = Piudm * 

Similarly for weight 4 we obtain 

4< = Pia< 4* 4 P 2 O 31 4” 3 P 3 C 323 4” OPsflsu 4- PiGnu , 
4ai = P nflai 4- 3PaiOiii -f- P«iOim , 

433 = P11023 4“ 2 P2i03U 4* PndxlU , 

42 ii — Pin<hn 4" P 2 iiDim » 

4uu = Piiuflim . 



88 


PAUL L, DRESSEL 


In general 

(12) At = 2P ri+Tl +... +I , (p~j)*'I(^!) ,r » 


(y? 4 1)" - * 7ri! 




and 


(13) ' A r J.., r m A ri A ri ••• *4 r „ , 

where as before the barred product indicates a symbolic multiplication by 
suffixing of subscripts. 

If in 


(14) 

(15) 


r! 


So <‘" 5 ‘‘(gi!) Tl ••• (giO'W 1 r t ! (7l)r ' ‘ 

. _ (-l) r,+ '* + - +r, (x 1 + ir 2 + ... + 


7T* 


(? ( ) r ' 
- 1)! 


n 


Ti+*a+ i ' • '4 ri 


the moment function of the sample which is thereby represented is the Thiele 
seminvariant l r of the sample. If the rids arc solved for by means of the appro¬ 
priate set of equations the expected value of l, is found. Thus we find 


N 2 n w 


(16) 


Elk) = ^h, 

N*n w n™ N 2 

- $5® x ' + Jr#! <» - - »«.. 


m';\ - 




N n 


N w ri 


»r2 ( S) 

x * ~ N^n* (?l ~ N ^ nN ~ n ~ N ~ 1)1(4 ■ 


m n _-W w 6 5JVV 




N^n 6 


N m n 6 


{n — N)(Nn — 12)k 6 , 


E[h h] — 


6 (fi) 


N 6 n 


3 (3) 


Nn 


JV«>n 6 


XaXj “ jv^e ^ ~ N ^ Nn - n ~ N ~ 5)x & , 


w 


where the k system of seminvariants used hero is defined by 


(17) 


■fr-gfc-D* 

A i<mQ 

« 2r+ i = E (-i) <+r 
. {-0 




© 

( 2r V 

\i + ?•/ i 


2i + 1 
+ T + I 


/Ir-i/lr+i+l- 


By virtue of the symmetry noted earlier it follows that the estimates of the 
Thiele seminvariants and products of these seminvariants of weight < 5 are 



OBTAINING UNBIASED ESTIMATES AND EXPECTED VALUES 


89 


obtainable from the last results by replacing E by B 1 (estimate of), x, by k, 
L by X, , and N by n. In this manner we find that Li , the estimate of A.i is 


(18) u - JT ' M - w - „) Wn - 6)1,. 


It is of some interest to note in the results ( 10 ) above that in those expected 
values or estimates which contain more than on,e term the factor N — n occurs 
in the second term. This, and the form of other coefficients involved in the 
terms, shows that as the sample size approaches the population size the sample 
semmvariants approach the population seminvariants. Another characteristic 
of such results as those given in (16) is that infinite sampling formulas are easily 
obtainable therefrom. Thus if in Li given m (18) N —> «j, we find 


r n i , n i 
— jj k -r -77, £4 

rtf - 45 n H) 


n 3 (n + 1 ) 
n w 


nu — 


3n 3 (n - 1) 
n«> 


m-i, 


the first of these forms checking the result given by Drcssel [4; p. 45] and the 
second form being identical with that given by Fisher [5]. 

The results exhibited above for finite sampling may lead to a mistaken idea 
about the simplicity of the results. Simplicity decreases rapidly as the weight 
increases. Thus for weight 6 we find 

m] = Xo + (n ~ N)iNn ~ 20)[8mo ~ 15m4M2 +1 ~ ^ 
N*n w 

+ mshi (n - X)[Nn(n + N) - 12 nN + 60] 

Jy m n“ 

(19) .[11/iB + 105/14/12 - 50 pi + 6 O/ 1 ?] 

- (n ~ N)[Nn(N 2 + nN + n) - 14 nN(N + n) + 71 Nn - 120] 

, 10 Mn (3) , , A , 6 n (s> . An , Ar . 2 n w , aA \ 

N w n 6 71 ^N w n* 71 N)(N-\-n 5) ]y (3, n 5 71 ^j Ke ’ 


Again by letting N —> «a infinite sampling results are obtained. Much of this 
last result vanishes in that case. 

It has been demonstrated that the k system of seminvariants are invariant 
under estimation in the ease of infinite sampling [4; p. 53]. It is therefore of 
some interest to note that this system also possesses the property for finite 
sampling without replacement The proof of this is quite simple. Denote the 
estimate of by K % and the fundamental relations are 



90 


PAH, 1.4 DliESHKf, 


These expressions hold for any n and hence for a population of A T . I»r*t Kl and 
K'ir+i denote functions corresponding to Kir and AVu l>nl with population 
moments replacing sample moment s and we have 

jrl N 1 J,-. „ A* 4 k , 

■n-jr ” mh * !r > ~ jyj *! Kiri1, 

Since the power product mode of formulation of K ir and AV+i insure^ that 
E[K it ] » Kh , E[K ifH } * Kin 


it follows that 


E[K ir } = E 





N % . 
f,ir » 


or 


AIM - 


n t 2 , iV 5 
«W«> % 


Similarly 


n w N s 

Afe r+ i) = 


thus establishing the theorem stated above. 


REFERENCES 

[11 P, S. Dwyer, "Combined expansions of products of symmetric, power mints and Burns 
of symmetric power products with applications to sampling/’ Part 1, Annals of 
Math. Slat. } Vol. 9 (1938), pp 1-47. Part II, Vol. 9, (1938) pp. 97-132. 

[2J P, S. Dwyer, "Moments of any rational integral isobarie sample moment fumijort," 
Annals of Math. Slat,, VoL 8 (1937), pp 21-85. 

[3] H C. Carver, "Fundamentals of the theory of sampling,” Annals of Math, Slat,, 

Yol. 1 (1930), pp. 101-121; 280-274, 

[4] P, L, Dressel, "Seminvariants and their estimates,” Annals of Math. Slat., Vol. 11 

(1940), pp 33-57. 

[5] R, A, Fisher, "Momenta and product moments of sampling diatributiona/’ Proc, 

Lond. Malh. Soc., Vol 2 (30), (1929), pp, 199-238. 



DETERMINATION OF SAMPLE SIZES FOR SETTING 
TOLERANCE LIMITS 

By S. S Wii-xb 

Princeton University, Princeton, N. J. 

1. Introduction. In the mass production of a given product or apparatus 
piece-part, Shewhart 1 lias discussed a practical procedure for detecting the exist¬ 
ence of assignable causes of variation in a given quality characteristic of the 
product as measured by a variable x Foi example, x may be the thickness in 
inches of a washer or the tensile strength in pounds of a small aluminum casting 
made according to a given set of specifications; x varies in value from washer 
tp washer or from casting to casting. Now suppose assignable causes of vari¬ 
ability in x have been detected by Shewhart's procedure and have been suffi¬ 
ciently well eliminated by making appropriate refinements in the manufacturing 
process so that for all practical purposes the remaining variability may be con¬ 
sidered “random,” thus allowing us to assume that we have a statistical universe 
U in which x is a random variable with some distribution law" f(x) f(x) is, in 
general, unknown and cannot be determined until long after the refined manu¬ 
facturing operation has been under way. Two types of situations arise in prac¬ 
tice, one in which a; is a disci etc variable taking on only certain isolated values 
as for example 1, 2,3, • • • , etc. with corresponding probabilities p(l), p(2), . • • , 
the other being that in which x is essentially a continuous variable over some 
range with a corresponding probability density function f{x). In this paper we 
shall consider the latter type of variable. 

The problem now arises as to how we should calculate a tolerance range 
( Li , foi x from a sample, and how large the sample should be in order for 

the tolerance range to have a given degree of stability. More specifically, for a 
given method of calculating tolerance limits, how large should our sample he in order 
that the proportion P of the universe included between L { and Li have an average 
value a , and will he such that the probability is at least p that P will lie between 
two given numbers , say b and c? For example, if a tolerance range is obtained 
by using a truncated sample range, that is by letting L\ be the greatest of the r 
smallest values in a sample and Li the smallest of the r largest values, r being 
chosen so that E(P) ~ ,99, how large should the sample size, say n, be in order 
for the probability to be ,9 that P would lie between .985 and .995? A similar 
question can be asked when the setting of only one tolerance limit is under 
consideration. 


1 W. A. Shewhart, Economic Control of Quality of Manufactured Product , D. Van Nob- 
trand Company, New York, 1931. 


91 



92 


S. H. WII.KS 


2. Tolerance ranges from truncated sample ranges, >’i]<• <-< dint wdhiinr i- 
known about the distribution fum-tinn/un except enough tn enable n- to a-mue 
that it is continuous. Let a be the average \ -tint* which /’is to have, and MipjMi-e 
a sample of size n is drawn from the univer-e I' -it flint j< 1 a) n y 1 «j 2 t, 
say, is a positive integer. I.et ri , Xi , ■■■ , x„ he f lie "snipli \ aloe* of ,r nnangid 
in order of increasing magnitude. Let 1„ -- jv and /•: A .,*• I be di-triim- 
tion law, -say g(P) of P the proportion of the imiver-e includ'd between these 
values of Li and Li is given by 


(1) i j(P)dP = 


r(n + 1) 


r[a(»+l)]r[(l - o)(n + 1)1 


1**"* ‘(1 ■ p) 


• 1 1 


IP 


This follows at once from the joint distribution law of a and ,.» which run be 
derived as follows: Consider the .r axis as being divided into k imituallv exelurive 
intervals Ii, h, • • , h with Pi , ]h , ■ • • , ]h ns the u—oriatol prohabilitie* 

(l2v* = 1^. In a sample of size n the probability that n ,, , n, 

n ‘ = values of x will fall into l\, h , ■ • • . h respectively i- given by 
the well-known multinomial distribution law 


(2) 


n\ 


rt-i I rii ! 




!h • 


To get the. distribution of x r and x„ r( .i we take fc -» 5 and for 1 1 , 
wo take the intervals (-*, x,), (av, A 4- dx r ), (x r + dx f , x„ . n j„ ix„ ,,,, 
Zn-r+i + dxn-r+i), (A-r+i + dx„„ r^ii *0 respectively. The value- of /n . < * -„ 

p t are the integrals of/(.t) dx over these five intervals re,s|ieetively and the valium 
of ni, n 2 , ■ • ■ , n s arc r — 1, 1, n — 2r, 1, r — 1 respectively. By substituting 
these values of the p’s and a’s in (2) and neglecting terms of order higher than 
dx T dx K - T+ i the probability element for x, and x„ , +l is found at once to Ik® 


n' 


(3) 


[(r — 1) !] a (?i — 2r) 1 


(/>>*)’ ' 

a r i>-f + l \n-2r 

fix) dx J fix Mix,. 


r f l) dx f dX, ■ t i 1 ■ 


r*r f “ 

Now let / f{x ) dx = u, J f(x) dx - v, then since du .* fix,) dx, and dv . 
-~f(x n ~ r+i) dx n -. r+ i , the, probability clement of u and e oiav be written us 


(4) 


r(?i + 1 ) 


r 2 (r)r(n - 2?' + 1 ) 


u r- V~ 1 ( 1 - u - v)* ir dtuh% 


Fot a discussion and a rather complete bibliography of the probability theory of “ex¬ 
treme values” such as x r and x„_ r+l see E, J. CJumbel, "Leg valcurs extremes de« distribu¬ 
tions statistiquea,” Annalcs de Vlnstitul H, Poincark (1935), 



T0LEB4NCE LIMITS 


93 


the region of u and r of non-zero probability lining the triangle bounded by the 
u and >' axes and the line u -(- v — 1 Making the change of variables 
1 — u — u — P and u ~ Q, integrating with respect to Q, and seating r = 
(1/2)(1 — a)(n + 1) we find the distribution of P, the proportion of the uni¬ 
verse included between ay and .r„- r+ i to be (1). It should be remarked that even 
if Li and Li are obtained by asymmetrical truncation by taking Li = x,, L 2 = x t 

where t, — s — n — 2r + 1, the distribution of P — f f(x) dx remains unchanged. 

Thus for a given p, by taking L x = x, and Li = x\ where, l — s = n — 2r + 1 = 

a(n + 1), and choosing the smallest value of n for-which / g{P) dP > p 

Jb 

and such that (1 — a)(n + 1) is a positive integer we have provided the answer 
to the italicized question for one method of calculating Li and Li ; a method 
which is valid for any unknown continuous distribution f(x). 

As an example, suppose we take a = .99, h = .985, c - .995 and p = .99. 
The size of sample required is found to be 1000 (999 to be exact). In fact in 
this ease the probability of P being between .985 and 995 is .992. In this 
example, wo may therefore make the statement that if x is a continuous variable 
under statistical control, and if samples of size 1000 arc taken, the tolerance 
limits Li and Li taken as the fifth smallest and fifth largest values of x in the 
sample respectively, will, on the average, include 99% of the univeise between 
them and furthermore, the tolerance limits calculated in this way for samples 
of size 1000 will, in about 99.2% of the samples, include between 98 5% and 
99.5% of the universe between them. 

If L r and L 2 are taken as the smallest and largest values of x in the sample 
respectively (corresponding to r = 1, i e. sample range, with no truncation), 
then in samples of size 1000, these tolerance, limits will, on the average include 
99 8% of the universe between them and the probability is .996 that % and Li 
will include at least 99.5% of the universe between them. If the largest and 
smallest values of x in samples are used as tolerance limits and if we wish to 
state that the piobability is .99 that such tolerance limits will include at least 
99% of the universe, the. size of sample required is 660 If the probability is 
lowered to .95 of including at least 99% of the universe, with such tolerance 
limits, the size of sample required is 130. Engineering statisticians 3 have 
pointed out on basis of practical experience the need of using samples of 100 to 
1000 on even more eases m order to set tolerance limits which will include at 
least 99% of the universe with a satisfactorily high degree of certainty. The 
examples wo have given based cm sizes 1000, 660 and 130 will indicate tho degree 
of stability to be expected for tolerance ranges for samples in this range of sizes. 
The degree of stability of the tolerance limits for samples of the size range 500 
to 1000 appears to be of about the order of that demanded by the engineering 
statistician, 


a Cf. W, A, Shewhart, Statistical Methods from the Point of View of Quality Control , Tlie 
Gaduate School of the J.S Dcpaitmcnt of Agriculttre, Washington (1939). P. 03. 



94 


S. S WILKS 


In some cases it may lie desk able to determine tin* -be nf .-ample- -o a- tc> 
control the tolerance limits Li ami /»; individually, that i-.-o that the piobulnlity 
is at least ji that the proporth ns of the universe contained m the tail-* of the 
distribution cut off by L x and L% art* in both cases between tun Riven numhei-, 
say d and c. In this case we would determine the least value of n -n that 


( 5 ) 



h{u, v) 


du dv > p 


where h{u, v) dudv denotes the function Riven by (l). For example, ,-uppu-e 
p = 99, d — 0, c = .005. r = 1. The size of the sample needed is 11)00 
Thus in samples of size 1060, the probability is .1)9 that L\ anti taken as 
the smallest and the largest values in the sample respectively will cut off tails 
of the universe such that each tail will include not more than 0.5*7 of the umver-e. 

If it is desired to set only one tolerance limit, say I .\, then the distribution 
of u would be used, This can be found by integrating (-1) with respect to r. 
The distribution is 


( 6 ) 


r(n + l) _ 

r(r)r(» — r + 1) 


ir‘(i 


u)"" r du. 


The probability p that the proportion or the universe in the tail which will In¬ 
cut off by Li is between d and e is given by integrating the expression (6) from 
d to e. The value of n required to obtain any Riven value of p ran I hen lie 
determined. For example, in the case where p ~~ .1)9, d -*■ 0, c «« .(Klu, r 1. 
the size of the sample needed is 920 


4 


3. Tolerance range for a normal universe. The method of setting tolerance 
limits discussed in Section 2 assumes nothing about the distribution fix) except 
that it is continuous. If fix) can be assumed to have a given functional form 
involving unknown parameters, methods based on the theory of statistical es¬ 
timation and having greater efficiency than those already discussed could Ik* 
used for setting tolerance limits Wc shall not go into a general dimi—ion of 
such methods here although it does appear desirable to consider out* very im¬ 
portant example of the application of the, methods. Suppose f{x) can be assumed 
to be a normal distribution function with unknown mean m and variance a\ 

rt 

In a sample of size n let £ be the sample mean and let « 2 - X) (-r< — .>T/(" — 1). 

i 

Lot us consider as tolerance limits L[ and L% the quantities £ sfc k«. Tlu> pro¬ 
portion P' of the universe included between them) limits is 


(7) 


1 

P i = -JL- f 

V 2t o- Jt-k, 


V**dx. 


We wish to determine k so that B(P') = a. It can be verified by straight¬ 
forward analysis that E(P'), defined by f f P%v, s ) ds dx, has the value 

J—t© Jq 


T(n/2) 


f l 


dx 


Vr(n - i)r((n - l)/ 2 ) U (1 + *V(n - l ))"' 2 




n 


n -T- 1 


(8) 



TOLERANCE LIMITS 


95 


where f(x, s) is the well-known distribution of x and s given by 

fn\ (jl 1) _8_ —J[n(-8—ml 3 -l-(n—1 )» ! )/it 2 

' 2" ,s -‘/v^r((n-l)/2) - 

Therefore the tolerance limits L[ and L'? which will include, on the average, 
a proportion a of the universe between them are 

( 10 ) * ± t a V(n + 1 )/n.s 

where t a is the value of t for which the integral in (8) has the value a. The 
value' of f„ can be found from Fisher’s 1-tabic for n — 1 degrees of freedom, and 
for certain values of a including .99, .95, etc. and for values of n up to 30. Al¬ 
though the tolerance limits (10) will include, on the average, the proportion a 
of the universe between them, we must now investigate the size of sample 
needed to obtain a given degree of stability of P' . The exact distribution of P' 
seems to be too complicated to be of any practical value. It is not difficult to 
verify that to within terms of order 1/n, the variance of P' is given by 

(11) o>/ = fie~‘"/(irn). 


The variance of P, the proportion of the universe included between x r and 
x„_ r+ x, to within terms of order 1/n is given by 

(12) a> — a(l — a)/n. 

For a large sample of a given size, say n = 100 or more, a simple comparison 
of the stabilities of the two tolerance ranges (x r > x n _ r+ i) and (x rfc l a \/(n -f- l)/n • s ) 
can be made by comparing o> and <r* >. For a = .99, the efficiency ratio o>//o> 
is .28 indicating that for large n and when the universe is normal, samples of 
size .28n have the same degree of stability in setting tolerance ranges (10) os a 
sample of size n has when (x r , x„_ r+ i) is taken as the tolerance range. The same 
thing may be viewed in another way: The fact that the range of values of P' is 
0 to 1 suggests that we may be able to get a fairly close approximation to the 
true distribution of P' by fitting a Pearson Type I function of the form 


(13) 


r(a d~ 0 ) p/a—l/-. p/\£—l 

r(«)ro3) u r) ’ 


determining a and 0 by equating the mean and variance of the distribution (13) 
to the mean and variance of P' respectively. Accordingly we find 


(14) 


a = [a 2 (l — a) — ao>/]/o>» 

/3 = [a(l — fl) s — (1 — a)o>']/c>'. 


Thus it will be seen from (14) that in order for the fitted distribution (13) to be 

t 1 e~ l ° 

identical with the distribution (1) a sample of only —— -- (n + 2) cases is 

7ra(l — a) 

needed. 

In case only one tolerance limit is to be set, e.g. x — t a \/(n + l)/n. s, the 



96 


h. h. W'lt.KH 


proportion, say w\ of the universe which will He inrbided in lie- fail hn-, moan 
value (1 


a )/2 and variant ^ 1 *' 'npproxiimiMv ! for largo w. Tin* 


ratio of this variance to thal of u, which i> upproximaU-h it «i ) In for 
large ft, gives the efficiency of using JV for the hover toh iauee limit in ease of a 
normal universe. For example, if a ~ -HU, the c(Fmrne> is .IK, 


Itisperhaps appropriate hero to point out tin- dM motion bofueou confidence 
limits and tolerance limits. It is well-known that in si sample from a normal 
universe with mean m the probability » ft that the confidence limits x j, t a n 
will include the population mean m between them. The tnlenuiee limits 


A ± WinT^f/ri-s, on the, other hand are User! to estimate the middle UHkd f r 
of the universe. Although the tolerance limits .? ± Av(n + ll/n-n me much 
more stable for a given sample size than those given by x r and j r„ Ml , in ease 
of a normal distribution, it should In* emphasized that in en«* of even slight 
non-normality, particularly when skewness is present, the former pan of limits 
are apt to give very erroneous results with refetenn* to the proportion of flu- 
universe included in the tails, Confidence limits estimating m are probably 
much less sensitive to skewness than tolerance limits estimating flu* middle 
100a% of the universe, particularly when a is nearly unity. 

Another important aspect of the problem of setting tolerance limit*< is the 
following: Suppose small samples of a given size are taken from u universe 
under statistical control. How many of these small samples should In- taken 
as a basis for determining tolerance limits L\ and In of some function, snv 
-of the samples (e.g. the sum of the measurements in each Maniple) so that Hie 
proportion of samples in the universe of such samples having values of <j lie tween 
Li and In will have a given mean with a given degree of stability? ()m> obvirnw 
approach to this question is to consider a universe of samples in the same manner 
in which we have considered a universe of individual throughout the present 
paper. This approach, however, does not make very efficient use of the observa¬ 
tions, but wc shall not enter into a treatment of the problem hen*. This problem 
and various related problems in the statistical methods of mass production 
remain to be studied, 


4. Summary. A method based oil truncated sample ranges for determining 
size of sample required for setting tolerance limits on a random variable ,r having 
any unknown continuous distribution }{x) und having a given degree of stability 
is given, A method for setting tolerance limits corresponding to a given degree 
of stability in case/(a) is normal is discussed and a comparison of the stabilities 
of the tolerance limits set by the two methods in the normal ease is made. 
Illustrative examples of the methods are given, 



ON A CERTAIN CLASS OF ORTHOGONAL POLYNOMIALS 


By Frank S. Beale 
Lehigh University, Bethlehem, Pennsylvania 

Introduction. E. H. Hildebrandt has demonstrated the following theorem 1 : 
If y n a non-identically zero solution of the Pcarsonian Differential Equation, 


( 1 ) 

( 2 ) 


l dy __ oq aix __ N 
y dx b Q -f biz + tax 2 “ D ’ 


Of, t>, real, then 


D n ~ k <f 
y dx n 


CD*») 


P.tt, x), 


n, k integers, n > 0, is a 


polynomial in z of degree n at most. Hildebrandt has obtained various relations 
connecting the P n (k, z) and their derivatives as well as a recurrence relation. 

If in (2) we set k — n there results from a proper choice of N and D in (1), 
the classical Hermito, Laguerre, Jacobi and Legendre Polynomials. Many 
properties of these classical polynomials have been obtained by numerous 
investigators.* 

One of the most important of these properties is that of orthogonality which 
can be stated as follows: Consider a sequence of the classical polynomials ^(x) = 
z' — Sa l ~' 1 There exists an interval (a, b) finite or infinite and a unique 

weight function 4>(x), monotonic non-decreasing over (a, b) such that, 


( 3 ) 


[ $ m (x)‘I n (x) df(x) - 0, 


for n 9* m. 


In the future we mil refer to the type of orthogonality given by (3) with \p(x) mono- 
tonic non-decreasing as orthogonality in the restricted sense. In order to determine 
whether a given system of polynomials is orthogonal in the restricted sense we 
have the following theorem: 3 * 5 

Theorem 1 . In order that the sequence of polynomials $,(x) = x 1 — + 


1 E, H. Hildebrandt, “Systems of polynomials connected with the Charlier expansions, 
etc,,” Annals of Math. Stat., Vol. 2(1931), pp, 379-439, 

1 For an account of these properties as well as an extensive bibliography the reader can 
refer to one of two treatises viz : J. Shohat, Thiorte GMrale dee Polynomea Qrlhogonaux de 

Tchchkhef, Memorialo des Sciences MathCmatiquos, Fascicule 66, Paris, Gauthier Villars, 
1930. 

Gabor SzegO, Orthogonal Polynomials , Am, Math. Soc., Colloquium Publications, Vol. 
23, 1939. 

5 J. Shohat, “The relation of the classical orthogonal polynomials to the polynomials of 
Appell,” Am. Jour, of Math., Vol. 58(1936), pp 454-455. 

97 



98 


FRANK K. I'FM.K 


. ■ - ,i = 1, 2, 3, ■ • ■ with real coefficknts hr orthogonal in ihr rmtrickd wmr it u 
necessary and sufficient that there exist a recurrence relation, 

(4) $<(x) = (x - CiY - X.dh-ifx), <Iv, * I, <lh ■- x - r, , 

c,, X< const, with all X, > 0, i > 2. 

With Shohat 4 toe will say that a system of polynomials *i>,(jr) x* - «S*,x‘ ! + 
... t i ss 1, 2, 3, • • < , twYA- real coefficients is orthogonal in thr general seme ij 
there exists at least one weight function f(x), of hounded variation ou r («, h) such 
that (3) is satisfied. In connection with generalized orthogonality we have the 
following theorem: 4 

Theorem 2. In order that the system <£,(>), t = 1, 2, 3, • ** he orthogonal m 
the general sense it is necessary arid sufficient that relation (4) hr satisfied until all 
X, 7^ 0. 

It is the purpose of this paper to investigate the orthogonality properties of 
the general polynomials P„(n, x) given by (2). In Part 1 a general recurrence 
relation is derived which applies to all the, polynomials P„(k, x). In Part 2 all 
the different types of orthogonal polynomials PJn, x) are determined by making 
use of the general recurrence relation derived in Part 1. We also show, follow¬ 
ing lines laid down by Hahn 6 , that the only systems of polynomials with simple 
zeros which are orthogonal in either the restricted or the general sense and whose 
derivatives are orthogonal in either sense are the systems considered in Part 2. 


1. The general recurrence relation. From (2) we can write, 


( 5 ) 


P»-i(k, x) = 


D 


ri—k—i jti—l 


=4»*» 


;-~.i n-ir ‘ui 


y dx n ~ l * y dx " 

Apply Leibnitz Formula to the right aide arid make use of (2). There results, 
P n -i(k, x ) = P„_i(k — 1 , x) + (n — 1 )D'P n ^(,k — 1 , x) 


( 6 ) 


+ D"DI\^(k - 1, x). 


From Hildebrandt’s paper we have, 6 

(7) P n+i (k + 1, *) = [N + (fc + DD'lP^As, x) + »[tf' + (k + 1 )i)"]Z>F*»i(Jfc, *). 

Decrease k and n each by one in (7) and obtain a relationship which we number 

(8) . Again decrease n by one in (8) and get a relation which we number (9). 


* J, Shohat, "Sur les poly«ome B orthogonaux ganGralluto," Complm Rendu*, Vol. 207 
(1938), p. 550, 

6 Wolfgang Hahn, "tlber die Jacobiechen polynomo und zwei verwiuvdte polynomklas- 
sen, Math. Zeits., Yol. 39(1634-35), pp. 634-638. 

* • E, H. Hildebrandt, loo cit p. 407, 



orthogonal polynomials 


99 


From (G), (7), (8) and (9) eliminate P n -i(k, x), P n _ 2 (A - 1, x), and P«_,(Jfc - 
1, x), There results, 

(10) (2A r ' + (2A - » + 1)/)''](A' + AD"]P n+1 (A + 1, x) 

~ f[2A' + (2k -n + l)D"][N' + kD"][N + (k + 1)Z> # } 

+ n[N' + (A + 1 )B"][2N'D’ + kD'D" - ND"))Pffk, x) 

+ «[2V' + (k + 1)2)"] {2(A' + kD"fD 

- (N + kD')(2N'D' + kD'D" - ND"))P n ^(k - 1, *). 

In (10) decrease n and k each by one and replace N and D by their values from 
(1). Thus we get, 

(11) (oi + (2k - nMo* + 2(A - 1 )bJP„(fc, x) 

“ {[«i T (2k — 2)bi][ai -f- 2Ab 2 ][ai T (2k — l)b 2 ] x 

+ [a, + (2k - 2)6J[ai + (2k - n)b 2 ][a 0 + M h ] 

+ (n - l)[ai + 2A6 2 ][a 1 h 1 + (A - l)bi b 2 - ao&s]}Pn_i(A - 1, x) 

+ (n - l)[ai + 2fci> 2 ]{b 0 [a 1 + (2k - 2)b,f 

- [«o + (k ~ l)6i][ai6j -f (A — 1)6^ - aab 2 ]}P„_ 2 (fc - 2, x). 

In this recurrence formula the P„(k, x) have in general a coefficient of x n dif¬ 
ferent from one, Polynomials which have one for the coefficient of x" we mil refer 
to in the,future as normalized. Let us now transform (11) for normalized P* (k, x). 
Theorem 1 deals with polynomials normalized in the above sense. Let us write, 

P«(A, x) * a n<k x n - b n x n ~ l + • • • . In (4) set, 4v(a;) = P n (A, x)/a n , k . 

Thus wo get, 

(12) P rt (A, x) = (A n x - B«)P n -i(k - 1, x) - 7 -JV*(A - 2, *) 
where 

7 „ « x rtj A„ s , and B n m C n . 

a„-i lk „3 d n ~l,k—l On-l, Jb-l 

Relation (12) is essentially of the same form as (11). Each of these is to be 
reduced to form (4). 

From a previous paper by tho author 7 we have, 

(13) P' n+i (k, x ) « (n + l)[A r/ + i(2k - n)D")P n (k, x). 

n — 1 successive applications of this relation give us, [Po(A, x) = 1], that the 
coefficient of x n in P„(A, x) is, 

7 Frank S. Beale, f 'On the polynomials related to Pearson’s differential equation,” 
Annals of Math. Slat., Vol. 8(1937), p. 207 (2). 



100 


FRANK 8. HKALK 


(14) a n , k = II t«i + (21* - n + 1 4- i')M. 

t*«a 

By employing (14) in (12) we see that (12) or til) reduces to form (4t where, 

_[fli + (2 k — n )ha][flfl + fchi] 

[at 4* 2MJ[ai 4* (2fc — l)/bl 

(15) 

_ C _ t\ l a <ht 4* (fc "" l)hjhj -■ dqhj) 

n ’ (a, 4- (2k - l)/o!!«« 4- {2k - 2tfe]' 

(ai 4- (2k - n - D/hHUa, + (2k - 21/*]’ 

(16) _, _ ^ __ — [a« 4- (k — l)5i]lat/»i + (k — 1)?A — * 

n “ {n J (ai+(2k - 3Mb, + (2k - 2)MV, + (2k - 1 M 

Equation (16) together with Theorems 1 and 2 ran now In* applied to the poly¬ 
nomials P„(fc, a:). 

From (14) it is scon that P n (k, x) is of degree n provided that none of tin forlorn 
of the product vanishes. This condition we amtnu' In hold lure fur all tt. 

We can now obtain a recurrence relation for the f/lh derivatives of I\tk. jr). 
A repeated application of (13) leads to, 

(17) — P,(fc, x) = P n - 0 (fe, a:) II (» - t) [a, 4- (2k - » + i + 1)V. 

ax q i**o 

where P„(/c, x) is not normalized in the above sense. By considering the right 
side of (17) together with (14) wo see that (17) can hr* divided hy 

9-1 

a n - 9l * II (n - t) [ai -f (2 k~ n + i 4* l)/>al 

and thus normalize the polynomials on both the rigid and left sides of (17), 
Consequently the recurrence relation for normalized x)]jdx" J , n «* 

0, 1, 2, • ■ ■ , is identical with the recurrence relation for normalized P„ „jk, x) 
as given by (4), (15) and (10) when we replace n by n — q in those latter. 


2, The different types of orthogonal P„(n, x). Suppose first that h S 0 iri 
(1). A transformation on x with real coefficients can be affected which changes 
(1) into either, 

(18) 1 dy _ (« - ff) 4- (- « - 0)x 

ydx 1 - a* 


(19) 


1 dy _ —2 mx ~ q 
y dx a 4 +"x 4 ” 

(A) Equation (18) together with (2) for k » n defines the generalized Jacobi 
Polynomials (normalized in the above sense), 

u x ,«, a) = —(i4- *)-(i - *r* j^Ki 4 - x) n+a a - 



ORTHOGONAL POLYNOMIALS 


101 


when* is given by (11). If in (10) wo set fc = n and make proper replace¬ 

ments for nil admits as (IK) unci (1) .show we have, 

X - l(u — 1) ,L “ + n — 1 )(/J + n —*1) 

( 20 ) " f« + P + 2 » ~ 3) (a + 0 + 2 » 2f(a + p + 2n - 1 )' 


n > 2. 

From Theorem 1 and (his value of A„ we conclude that if « > — 1, 0 > -l, 
the sequence !./*(/, «. 0) 1 is orthogonal in the restricted sense -a well-known 
riwilt. From Theorem 2 we can similarly conclude that if neither a, (S, nor 
(« + 0) equals —j, j a positive integer, the sequence \J n (x, a, 0)} is orthogonal 
in the* general sense. 

(Aj) If in (IK) we set a — 0 ~ 0 v\c* obtain a differential equation which 
together with (2) for k — n leads to the Legendre Polynomials, (normalized in 

above sense), I’Jx) ~ fat) 1 . ^ (x s — 1)”. Setting « = 0 = 0 in (20) leads to 


A„ « tn ' U *• ,n >2. Thus from Theorem 1 we conclude that the 

(2» — 3)(2« — 1) 

Legendn* Polynomials are orthogonal in the* restricted sense, a’result well known. 


(B) [equation (10) together with (2) for k = n leads to a class of polynomials 
(normalized in above sense), mentioned by Romanovsky. 8 


R„(x, m, q, a) 


- * (a 2 P /)" 


exp ( (J tan 

\a 


„,x\ d" 
a) tlx n 


(«• + x 2 ) n_ffl exp -? tan -1 * 


x 


where again is given by (14). In (10) set k — n and make the proper 

replacements of constauts and, 


it 1 (2m •— n 4- 1) {da" 2 (OT — n 4- 1) ! 4* <f 1 

“4 ~ (2m - 2n 4- 3) (to -n4 l>(2ro- Sto+T)* 


From Theorem 2 it now follows that the sequence [R n (x, m, q, a )) is orthogonal 
hi the general sense if m ^ j/ 2, j a positive integer. There is no set of parameters 
to, q, a which assures orthogonality in the restricted sense. 

In connection with Romanovsky’s note there appear to be several discrepan¬ 
cies. For the weight functions given there under types IV and V, 'the nth 
momenta for sufficiently large n do not exist over the intervals there considered. 
Type V is the special cases of type IV for a 0. Type VI is none other than 
Jacobi Polynomials so that the orthogonality relations given there for this case 
are incorrect. In all three types listed certain of the recurrence relations for 
the polynomials are in error, 

(Bi) We note here one special sub-class of R n . Take m — q = 0 and a — 1 
in (19). We obtain from (2) and (14) a system of normalized polynomials 

Til d n 

analogous to the Legendre Polynomials namely, <t>r,(x) — 


8 V. Romanovsky, "Sur quelquos classes nouvelles de polynomes orthogonaux,” Comptes 
Rendua, Vol. 188(1929), pp. 1023-1025. 



102 


FHAXK H. 1IEALK 


i: 


It is easy to verify for these that, 

4>n(x)4>„(x) dr ~ 0, 
(C) Suppose that iti (1), fa 

1 (fl/ 

coefficients changes (1) into, 

y (lx 

and (14) for k 
in above sense), L„(x, a) 


m n. 


i jr \ - - 1 . 


0, hi 0. A linear Iraiiffurmatioii with real 

' v x . Thin eiiuutinn together with (2) 
x 

n defines the genemlissetl I-oguerre I’niyitotiuah*, Umrinntiged 


(~l)"x vf [x"‘Y'). Setting k 
dx n 


n ami making 


proper replacements in (16) we get, X* ~ (n — l)(a + n - 1). n . > 2. From 
Theorem 1 wc see that if a > — 1 the L n are orthogonal in tin* restricted sense, 
a well-known result, From Theorem 2 we can say that if « jrf -j, j a fugitive 
integer, the polynomials are orthogonal in the general sense. 

(D) If in (1), fa = fa = 0, fa 9^ 0 we can perform a linear transformation on 


hx. This differential equation together 


x with real coefficients and get, - ~ 53 

V dx 

with (2) and (14) gives a set of normalized polynomials (?„(*) e u n 

Taking 1c = n and making proper substitutions for constants in (10) tte get 
X„ = — (n ~ l)/h, n > 2. If h is negative it follows from Theorem I that tin* 
sequence {(?«(*)) is orthogonal in the restricted sense. In fart, (fjx) ^ 

Hermite Polynomials. 

On the other hand, if h is positive we -have from Theorem 2 orthogonality in 
the general sense. In fact, it can be easily verified for this caw that, 

Z {QQ 

e hl * n G n (x)G m (x) dx = 0, mg n, i <«■%/ — 1. 

too 


(E) The only remaining possibility for (1) not so far discussed occurs when 
N = constant and D is linear. In this case it has been shown that l\(k, x) 
' of (2) reduces to a constant. 9 

E, H. Hildebrandt has shown 10 that the polynomials P n {n, x) of (2) satisfy 
a differential equation of the form, 

(21) k 2X ^ ^ ^ ^ < ' fll + 26*)*] 

® r 

— «[ai + (n + l)fa]y «® 0, n *» 1, 2,3, * * *. 

Moreover with the coefficients of cty/dx and dy/clx in (21) he lias shown that 
for (21) to have a polynomial solution of degree n the coefficient of y must Ik; 
of the form given in (21). 


• Frank S. Beale, loc. oit. p. 20Q, Theorem I>, 
10 Loc. oit. pp. 404-405. 



OHTHOOOJf U. POLYNOMIALS 


103 


From (IH) wo ran say that for k = n ami an orthogonal .sequence P„(n, x ), 
n = 0, 1, 2, * * ■ we have, 

(22) «i + (n — l)fj a 0, 

(23) k\m -f >2n ~ 2)/»,f - [a„ + (n - 1) b x }{aA + (a ~ 1)W; 2 - aA] ^ 0, 
where n ia an integer > 2. Considering for (21) a .solution of the typo y = 

c > x ' we readily' show (hat, if (22) and (23) are satisfied, (21) possesses for 

|*»(! 

each n a single polynomial solution of degree n. Two solutions which differ 
merely by a const ant factor are regarded as the same solution. This polynomial 
solution of (21) must he P„(n, x). 

By employing theorems from a previous paper by the author 11 we can show 
that if (22) and (23) are satisfied, the zeros of the polynomials of section II are 
simple whether these zeros are real or complex, 

Hahn has shown 12 that if a set of normalized polynomials and their deriva¬ 
tives satisfy a relation of the form (*i) with X ( y* 0 and if the zeros of the poly¬ 
nomials are all simple then the polynomials must necessarily satisfy an equation 
of form (21). Since in this paper we, have considered all possible values of 
a,, (i ~ 0, 1), and b (, (i ~ 0, 1, 2), which lead to orthogonal polynomials, it 
follows that the only systems of polynomials with simple zeros and orthogonal 
in either restricted or general sense whose derivatives in turn are orthogonal in 
either sense are the systems of section 2. 

11 Lor. rit. I*j). 207’200, Tlieortmm h lo he. 
o Luc. eit. pp. te-l-im. 



THE SKEWNESS OE THE RESIDUALS IN LINEAR REGRESSION 

THEORY 

By P. S. Dwyer 

University of Michigan, Ann Arbor , Mirk. 

In obtaining the regression of y on x it is customary to show the relation 
between the actual and the estimated y by computing the standard deviation 
of the residuals with the use of the formula c, = <r„ \f[ — r\ If the errors 
are distributed normally one may estimate the number of values corning within 
one standard deviation, within two standard deviations, etc,, of the regression 
line. However these errors are not always distributed normally, and in such 
a case it seems wiser to compute the* skewness of the residuals and to use a 
Pearson Typo III curve in making the. interpretation. The present paper out¬ 
lines a technique for the calculation of erg., which is feasible from a practical 
standpoint. It is based (a) on a cumulative totals method of obtaining the 
correlation coefficient which, at the same time, makes possible the determination 
of the third order moments needed to evaluate, the skewness and (ft) on an effi¬ 
cient ritual for computing the coefficient of skewness from the moments. 

The determination of the normality or non-normality of the residuals is not 
always immediately evident. If the scatter diagram or correlation chart is 
presented, one can make an estimate of the extent of normality hut if not, and 
the most modem and efficient computational methods do not utilize the correla¬ 
tion chart, there is no way by which the presence or absence of normality can 
be detected. Some research workers are opposed to the Use of the, more efficient 
methods (particularly the use of the Hollerith tabulators) because the correla¬ 
tion chart is not presented. Though within limits it is possible to use, the 
tabulator to present the correlation chart simultaneously with the values needed 
to compute the correlation coefficient [1], it is here suggested that the computa¬ 
tion of the skewness of residuals, which can now be accomplished quite easily 
from the tabulator runs, may be substituted for the examination of the correla¬ 
tion chart. 

The classical least squares theory makes use of 
M t = y ~ b a ~ bjz 

where bo and bi are the solutions of the normal equations, We note that the 
first normal equation is 2e = 0 so that M, = 0 and the residual is a deviation. 
It follows that the skewness of residuals is 

S (V ~ bo - M 3 
Nc] 
m 


(2) 



HKEW.NKSH OF RESIDUALS 


105 


We wi**h to compute « 3 , without computing the individual residuals. The 
denominator chum*/. ti- little eoru'ern hut it .seems discouraging to evaluate such 

an exfire^inn o-s 

v/ - .V/4 * hiz.r - 3l»,2jpjr + MoSy 

- :UiSl»,S.r + M\zfy - Jltf&oSa: 2 + fflobiSxj/ 


even though tli.< values „f , /.,, .V, Sr, 2?/, 2 .k 2 , Sj-y, V. 2afy, Sasy*, 
1‘;/ are available. 

A lir-t -implifirutinu is made by .summing (1) and dividing by N. We then 
have 


(Ti) -1/« — A/, lit i liyhlx 

ami In’ subtracting <3j front (1) and denoting deviations by barred letters, 
we have 

(4) < = V - M 


so that the skewness of errors is 

v.-.* 

(3) 


it/ - »b,n^ + Misty - nix*' 
AVj 


This formula can also be expressed as 

fim “ Mi fin + 3hiWi “ &i£« 

(l " ito-h^r 

A nintilur formula for t he skewness of the residuals of £ on y is 

fisa — Mi fin + Mi 2 fin — hi fim 

(/) «««' “ ", >. -i3/2 

Uio — <ntmJ 

For theoretical purposes formula (0) may be put in standard units with 
lj, as r** 1 ', b't r ffl j fim — Oaovl, fin ~ aii&i&i /, etc, with the resulting 

(T w 


<rt 


m 


«3:< 


aoa 


Hrcm + 3r’«2i — r am 
(1 _ r>)ia 


As r - * 0, «*,, «,, v just aa er, — »<r* a« r 0. 

Formulas (0) and (7) art* of some theoretical importance m that they show how 
the skewness of the residuals is connected with the skewness of the marginal 
distribution. Thus 

as jin —*• 0, h and b[ -+ 0 and a t ., —> a 3 , v , aam -» * J 

as hi -+ OP, a 3:( -+ -os,* and as b[ -> « , «a:.' -»■ -«*•» 5 
as &! -»• 1, 03 ,, -» a 3 , v -s . Similarly as b[ -*■ 1, «*.' -» • 



106 


P. S. DWVKIt 


It is hence possible in some eases to get a good approximation to the skewness 
of the residuals if the regression coefficients and the skewness of the marginal 
distribution are known. 


TABLE I 

Correlation from first order cumuhtums 


Cl) 1 

(2) | 

(3) | 

W| 

(6) | (6) i (7) 1 

(«) < 

c»); 

(10) !' 

OU 

(12) , 

(13) 

(H)] 


■ 


H 

3.99 

3,50- 

3.49 

3.00- 

2,99 2.49 fl.SHJ 
2.6O-j2.0O 41.50 . 

1.49 

1.00 

,99 

,50 

49 

.00 


1 


\ X 

V \ 

■ 

8 

7 

6 

5 

i 

31 

2 

1 

0 

1 

1 

1 



f>\ 

13 

50 

107 

220 

311, 

179 

i 

121 

GO 

35 


m 

4.00 

6 

18 

5 

2 


5 

, 

3 

i 

- 



U3‘ 

108 

3.99 

3,60- 

6 

10G 

2 

19 

29 

27 


71 

1 

1 

I 

673, 

638 

3.49 

3,00- 

m 

178 

3 

12 


53 


IS 

fl 

5 

2 

1*503 

; 

1330 

f|| 

3 

270 

3 

10 

20 

55 

103 

33; 

27- 

11 

n 

250S 

2160 

fH 

2 

330 

■ 

0 

» 

54 

114 

67 

46 

1!) 

13 

3714 

2820 

1.99 

1.50- 

1 

173 

I 

1 

6 

19 

45 

44 

si 

18 

7 

4214; 

2993 

1.49 

1 00- 

0 

51 

I 

| 

2 

7 

14 

10 

8 

0 

♦i 

■1 

43U9j 

2993 



Cy I 

61 

269 

601 

1330 

2194 

2678 

2809 

B 

2993 

13816:10060 



Cx, 

104 

454 


■ 




■ 


B 



For actual computation, we use (6) and (7), It has been indicated previously 
how the values Xx, Xy, Xx 1 , Xxy, 2 y i , 2s 3 and Xy* could be obtained with the 
use of cumulations. An illustration used previously [2] is presented in Table I. 
The information was obtained from the Office of Educational Investigations of 
the University of Michigan and gives the University first semester average ( X) 
and the high school average (7) for 1,126 students entering the College of Litera¬ 
ture, Science, and the Arts in 1928. 

The new origin of each variable is taken at the class mark of the lowest class 
rather than at the class mark of a middle class as is conventional. In this way 
all negative terms are avoided in the computation of the moments. The x's are 
arranged in descending order from left to right and the y’s in descending order 
from top to bottom. The notation is used to indicate the sum of all the z'g 





















































BKEVNKHK OF RESIDUALS 


107 


having the same value »»f y. Thu** the first entry in column 13 i.s 5.8 + 2-7 + 
5.0+ 5-5 4-1-1 113. The column f’.r„ is obtained by cumulating the values 

of x v . Similarly »/* is the sum of all the y'n having the same value and the first 
entry in column 11 is 18(0) «, 108. The entries Cy y , Cy x , and Cx z are obtained 
similarly. 

I he entries 2r, 2.r 2 , Zxy, lit/ 2 are found in the lower right hand box in this 

posit ion: 


; 2i j 2y 

2?/; Zxy IV 
lx ; 2s s ; 

This values of 2x and 2»/ are obtained from the final cumulations while the value 
of 2xjf is obtained by adding the entries in the column above, or, as a check, 
the entries in the. row to the left. The value of Zif is obtained by adding the 
entries in the rmv at the left of the box while the value 2x J is obtained by adding 
the. entries above the. box. 

The values of the third order sums are obtained by multiplying the entries 
above the box and to the left of the box successively by 1, 3, 5, 7, 9, etc. Thus, 

2x* » 4399 + 3(4339) + 5(4097) + etc. = 102,103, 

S**y * 2923 + 3(2809) + 6(2578) + etc. = 63,121, 

(9) 

Zxif m 4244 + 3(3714) 4- 5(2568) + etc. = 4(5,047, 

Zy* « 2993 + 3(2820) + 5(2160) + etc. = 38,633. 

In making the reductions we use al> — cd operations as much as possible. 
We first compute 

A„, v = NZxy ~ (2s) (2y), 

(10) A,,x - W2x a - (2x) a , 

A f t, v - tf2*fy - (2s 2 ) (2 y). 

We note too that 

flw “ ~~ (22s) (A,,,)] /N*‘, 

( 11 ) 

flit - [AM,.** - @£vKA:v)]/N*} 

and finally we get a 8I «or a 3i ,< by (6) or (7). 

The general solution is outlined on the left of Table II. We record in Fig. A 
the values given by (9) and in the Fig. B the values resulting from the applica¬ 
tion of (10). The values 22y and 22s are inserted in Fig, B to facilitate the 
calculation of Fig. C which gives the values of (11). The technique is very 
easily carried out on.ee it is understood. It can be performed with hand calcu- 


Pu - [NA*, V - (22s )(A XlV )]/N 3 
Pea - INA V * IV - (2 Zy)(A„ u )]/N 3 




108 


i>. » mvvm 


latore but il Sk ideally adapted to the use of tin* lab*-* Matriintf, Fridt'n, and 
Monroe models equipped with automatic jtoMrive and lmualhe imilfiplication, 
so that ab-cd operations can be perfonned wish a niirmnmti of < ffori and a max¬ 
imum of accuracy. Actually the value of “a,” which is the total frequency, F 
the same for many of these ojK-rations mi that there is further -a’, inn if a ma¬ 
chine is used which permits the looking in of a constant in such a way that it 
can be used, without continued key punching, in later aWd o)»crniionK. 

TAULK It 


Abbreviated lechnu/ues far crnnjmiing third order oultal n .uwrrdi, rfr. 

Fir. A. 


N 

Sx 

2x* 


1ISM 

i:t‘« 

20215 

10210.3 

sy 

Sxy 

Sx'U 

i — ■ ' 

i 

1 

i 

iKisfl 

I as! 6 

63)21 

« 

2!/’ 

2ty 4 


nwfi'.i 

(ft) 17 

amw’i 

2J/ 1 



“1.. 

1 

I’ig. H. 

3*63.3 

msm: 

K7t»s 

25010223 




N 

2Xx 

A,., 

] A,*„ 

22 y 

A x ,y 

Ar*.y 

1 

j 

mm 

1203IK3 UUMSM51 

11312X0764 

A»,u 

Ay* tX 


1 

j 

2370015 

7W.t| 

tUHOtHI 




i await. 

lmisum 




_ Fip. C. 

1126 

' - 1 

N 


Ax t ji 

iV J p )( i i 


A x ,y 

N*p u 


23796'tfi] 

wav 152 

- 

Ay, V 

N*p u 


— - 

. .. ! 

014WKJ2K 

1 

\ 

N>p„ 



| amratRl 

, 




Fig. D. 





N 

(6.) 

P 20 

Pol 1 

. ] 

1126] 

(.367); 

2.717 

- .7025 

(K) 

Pu 

Pat 

(-6?), (-360 i 

( 631) 

,m j 

, IHtll 

(-1 ®t) 

Pl!2 

PlS 

(36f), (3b[*) 

1.K77 

.ttfiUj 

( HI6) 


Pos 

(-36,), (- 

-K*) 

T 

.6626 

(-.imy 

i 

1 

1 



The values in Fig. D are obtained by dividing the values A„. ¥) A*, v , and 
A x , x in Fig. C by N 2 and the values in the diagonal below, NA ¥ i., ¥ - {2Xy)A y , v , 

etc., by N\ The values fci = ~ and fq *= can be inserted in Fig. D adjacent 


to the N. The value of the correlation coefficient is r — -y/btbi *» - 

VfwAw 



HkF.WN'Khft OF KCTIDIULS 


109 


\\ f !»;>*«• l»*n. ft, ■- v *wiz %"aii tui«i «r*< — V w — 6[fin ho that the standard 
deviation of r»- id-nth j* readily computed from file entries of Fig. D. The 
numerator of tt*) i*. readily obtained after entering —3?»i, 35?, (—hj) in the 
diagonal uteb»r *K* diagonal eontuining the third moments and multiplying by 
column-. The numerator of r7j is obtained by entering ~~b{ 3 , 36}*, —iib {, 
in the Mime diagonal and multiplying by rows. The theory is applied to the 
results of Table I ami the detail** are presented at the right of Table If. It is 
to 1«> noted that ail values indicated here are tlie ended values x, y and not the 
original valuer-. X, Y. However, the correlation coefficient and the skewness 
of errors are independent of any such change in unit, grouping errors being 
neglect ed. 

From Fig. I) we nt that h, *. .997/2.717 = .307, that b[ = .997/1.877 = 
.531 and that r ~ v f.3679.531) =*• .111. In tins ease we wish to estimate 
college record, r, from high school record, y, so we use b{ = .531 and compute 
-35; -l.5M.3fti'* - .HIfj, -b[* » -.150. It follows that 


03 ., 


925 + UHOIK-1.593) + (.6014) (.816) + (.5029) (-.150) 
(2,717 —.&3l(.997)) ,,s 


-.334. 


It tints aptniars that a better picture of the variation of the residuals in this 
case is obtained with the use of a Pearson Type III with a* approximately — \ 
than i« obtained with the use of a normal curve. It is not necessary, of course, 
to form Fig. 1 > m the results can alt be obtained from Fig. C. Thus if wo 
multiply the numerator and denominator of ((!) by N 3 , wo get entries, with the 


exception of the ft's, which are in Fig. C. 


Now in this case bi ~ 


and b[ 


A.** so that these values can be inserted in the upper left as before. Also the 
***»».» 

powers of h[ can be inserted in the lower right as in Fig. D. We, have then 


-1131,286,764 + (685,438,652) (-1.593) + (944,161,028) (.846) 

+ (803,580,390) (-450) 

W3: *' “ (3444669 - (1263483)(.531j] J « 

We know however, since the grades were coded, that it is not sensible to carry 
results to more than three places, (and, indeed, a three place determination of 
the skewness is very satisfactory for interpretive purposes even though more 
places might be obtained) an we cut down the number of places. The division 
of numerator and denominator by 10\ and the dropping of the decimals results in 

-1131 + C85( —1.593) + 944(.846) + 804(-.150) _ 

“ S; *' “ ' ' [344 — 126( 531j]’^ 

It is possible of course to duplicate the theory indicated in Table II with the 
use of moments rather than the A/s. In this case Fig. A consists of 1, 2a -,/N, 



110 


I>. 8. DWYER 


TIJ 


Sa ?/N, etc, AVe have such formulas as a t>v =* — 


(i’x) <2i/) 
,V A' 


"" MrWi» 


where a IV = 


fljJ.y — 


. 4 , 


etc. 


N t ’ Ab 

II would be possible to compute the m, in a somewhat similar fashion though 
it would take somewhat longer. In the first place we would liftve to compute 
IxY from the correlation table. Thia could be done by forming tin* cumula¬ 
tion C(y«) and multiplying by 1, 3, 5, 7, 1), etc. When this is done, however, 
it does not appear that the calculation of the central moments of the fourth order 
can be reduced to as simple a ritual as the calculation of t he 1 bird order moments. 

The question should be raised as to the calculation of the skewness when 
there are two or more independent variables. This can he done, of course, but 
the calculations arc lengthy. The point of the present paper is to provide an 
easy and simple technique for computing the skewness of residuals in the case 
of two variable linear regression. 


REFERENCES 

[11 Paul S. Dwyer and Alan I), Meackam, “The preparation of correlation tallies on a 
tabulator equipped with diftit selection," /our. Am, >S7a(. Ass., Vo! 32 (1937), 
pp. 654-62. See particularly page 657. 

[2] Paul S. Dwyer, “The computation of moments with the use rtf cumulative totals/' 
Annals of Math. Stal., Vol. 9 (1938), pp. 288 30*1. See particularly pugcti 299 303, 



NOTES 

Thin S'rtvm in derated to brief research and expository articles, notes on methodology 
and other thnrl items. 


NOTE ON THE ADJUSTMENT OF OBSERVATIONS 

By Arthur J, Kavanagh 
The Forman Schools, Litchfield, Conn. 

The method of least squares has been extended to the adjustment of observa¬ 
tions with errors iii more than one variable. The history of the development 
and its principal results have been given by Doming [2], [3], [4], [5]. The basis 
is the assumption that for the '’beat" adjustment the sum of the weighted 
squares of all the rmduals (observed values minus adjusted values) must be 
made a. minimum with respect to the adjustments to the observations and with 
respect to the parameters involved in the conditions the adjusted values must 
satisfy. In certain problems* such as some arising in the study of relative 
growth in biology, this assumption is not adequate; it is necessary that the 
sum to be minimised be generalized to include cross products as well as squares 
of the residuals. 

Suppose we have a wet of n universes of (^-dimensional points whose centers of 
gravity arc known to satisfy certain conditions; for instance, they might all lie 
on a certain type of curve. A sample having been taken from each universe, 
the center of gravity of each sample is taken as the observed center of gravity 
of the corresponding universe, and it is desired to determine the most probable 
set of adjustments to the coordinates and the most probable act of parameters 
involved in the conditions, subject to the requirement that the adjusted values 
satisfy the conditions exactly. It is assumed that the sampling distribution of 
the center of gravity in each universe satisfies the multivariate normal law, and 
19 that the standard deviations and coefficients of correlation of each sample may 
with sufficient accuracy be taken as the constants of the correapondmgumverae. 
Then by Masoning analogous to that of the derivation of the least squares 
principle for one variable from the univariate normal law, the probability of 
getting the observed act of values is proportional to e~ Q , where 

(1) Q =* 23 Qi 

Qt being a homogeneous quadratic function of the errors at the fth centroid and 
in general involving the cross products as well as the squares of the errors. 

Ill 



112 


AHTHt'H J. KAVANAOII 


The probability will be a maximum when Q is a minimum. ('nn-rqueuf lv iho 
best estimates for tin* coordinates of the centroids will be those making Q a 
minimum, subject to the eornl 'ions which the coordinates must sati-fy. 

For example it may be desired to study the relation between height and weight 
among growing boys by fitting a curve to the points whose ahsehsn and ordinate 
are respectively average height and average weight of a particular age gump, 
one point corresponding to each age group in the study. The data for such a 
study, arc obtained from samples of the several age groups. Then the number 
n of universes is the number of age groups lining .studied, each universe con¬ 
sisting of the totality of two-dimensional points obtained by pairing the height 
with the weight of each boy in the age group. The centroid or ‘'average point *’ 
of each universe would ideally be obtained from measurements of all the in¬ 
dividuals of that age, but since sampling must be resorted to it is necessary to 
make allowances for the sampling distributions of the centroids. It is known 
that within each age group there is correlation between height and weight, (1), 
Consequently the sampling distribution of each centroid will exhibit a correla¬ 
tion which can be expressed in terms of the coefficient of correlation between 
height and weight of the individuals of the universe from which the sampling 
distribution arises. The. existence of this correlation results in the presence of 
the cross-product term in the exponent of the bivariate normal formula de¬ 
scribing the sampling distribution of the average values, that is in the Q, of each 
centroid. If there were no such correlations the cross-product term in each Q, 
would vanish and the situation would reduce to that of least squares. 

In the general case, let Xu , X t ,, •• * , X,„ lie the observed coordinates of 
the ith centroid, Xu , * 2 ,, • • • , x n t the adjusted values (to be determined), and 
V„ — X n — x ,;. Then Q % may be written 

<3, = uiiuTh -f- WittVuVti + •*• + ifliaiVnYa, 

£2) + wiuVuV u + WwiVu to* 4 <YiiV 9 f 

+ .*. .. • • 

-b w tl .V q ,Vu 4* WifliViiVi, + ■ • ■ -f- ie y ,,Y*,- 


the re’s being the weights, within,** = w kit . Thus in the ease of two variables, 
if Ah be the number of items in the ith sample, r, its coefficient of correlation, 
and ci,, oji its standard deviations, then 


Win = 


Ni 


2(1 - rjcri. 


5 1 


Wva 


_-Alf 

2(1 — r{)d<aji 


W*W, 




Ni 

2(1 — 


Tho coefficients of the cross products in Q involve the coefficient s of correla¬ 
tion of distributions. If the latter are all zero the cross products vanish and Q 
reduces to the sum of weighted squares, which is the basic expression of the 
least squares procedure. Consequently, from this point of view, the least squares 
assumption is equivalent to the assumption of zero correlation between the 
errors. The procedure in the more general situation might be called "least 
quadratics". 




AtIJt KT.MKNT OF OHSKRVATIONS 


113 


Tin- Lagrange merle id of undetermined mullipliors can be used to calculate 
Hu* v:il»jr>- nt »hi* adjustments fn the enurdinates and the values of the param¬ 
eters Tie* pmr'dnre is ihv same a- for least squares [2], [3], [5], the only 
diSTnvtjei' Iw'iuf; t lie Mtntowilnd greater complication of the algebra. We. shall 
Hiiiiitiarize tin- di voloprnent here. 

The eotidttioti equations, -iipiniM'd v ia number, may he written 

, - • • , ; /i, , jh , • • ■ , ;» r ) -0, h = 1, 2, • • • , v, 

u item each F' may in genera! involve any or all of the numbers x„ as well as 
any or all of tin* parameters /»,, whose number we suppose to bn r. Let 

(3) /v ■“ hF'/Ar,,, /<’{ = aP/dpi 


nliere the A'V have Iweti sttbstitnlrsl for the t's after differentiation, and each 
p, has Iktii replaced bv the best available approximate value p m . Let Fa be 
the value of F’ 1 after the same substitution. Also let. t't ~ pto — pi . Then if 
the V's and r\v are small the eonditions may ltu written 


(II 


ZZnr. + Zftn-Fi, 


h = 1, 2, 


Differentiate Q with respect to (he Vs and equate the result to zero, eliminat¬ 
ing the factor 2. Differentiate (1) with respect to the. Vs and the u's, multiply 
each equation by the corresponding undetermined multiplier — A/,, and sum 
the results together with the result from differentiating Q. Collecting coeffi¬ 
cients rtf the differentials 6V „ and Sv t , equating to zero and transposing the 
terms Involving A*, we get. 

WinVw "b + ••• + u'nnV,, ~ [A/,f'\.] 

(S) ututVit + 4- • • * + wttfVqi — [A^Fs,] (t = I, 2, , n) 

te,nVn ~f* iv^uV i, -f- * • * + = [A^F,,] , 

(0) IAaFS] = 0 l — 1,2, < • •, r, 

where the brackets denote summation with respect to h. 

Equations (5) can be written down easily, since the coefficients w lk , appear 
in the same order as in (2). The equations corresponding to each i form a 
complete set which can be solved independently of those for other values of i. 
The solution ean bo expressed 

V„ Ai f ([An/' , i.] + Ajn[A*f'lhl + • • ■ + A.«„[AaFV] 

<7) -k±A*rt.l 

L fc**i J 

where A k a is (— l) k1j times the minor corresponding to w kji , divided by the 
principal determinant. By symmetry A k ,i = A , k , ■ 




XU 


ARTHUR J. KAVAHAQH 


The V’s in (4) are to be replaced by their values from (7) and the coefficients 
of the X’s collected. To facilitate this let 

n 

L,k ~ JL 

t-1 

where 


L lU **.£t'A nS rM. 


Each Liki can be written down easily from the eorreapemding Q, tis written hi 
(2); in each term replace w rii by A r „ , V ri by FU , and F„» by , 

It is important to preserve the order of the subscripts of the 7's in (2), and to 
treat the diagonal terms w, T {Vl, as though written w rr iV,iV ti . It is scien that 
Lm ~ Lkn , and L jk ~ L Kl , Then the substitution from (7) into (-1) gives 


( 8 ) 


+ ]£ fUi ~ K h ** i, 2,. >., k 

j«~i i-i 


Equations (8), with (6), arc formally identical with those of the least squares 
procedure which are called by Doming the “general normal pquatioiw", and 
they can be written schematically in the same manner. The further procedure 
is identical with that for least squares, involving solution of the. general normal 
equations for the X’s and v’a, substitution of the values of the A'a into (7) to 
obtain the V’b, and then adjustment of the observations by use of the J'V. 
and adjustment of the provisional values of the parameters by use of the Fa. 

A word of appreciation is due Dr, 0. W. Richards of The Spencer Lens Com¬ 
pany for calling this problem to my attention, and for encouragement in the 
carrying out of the solution. 


REFERENCES 

[1] Joseph Behkson, "Growth changes in physical correlation- -height, weight, and 

chest-circumference, males," Human. Biology, Vol. 1(1929), pp. 462-502. 

[2] W Edwards Deminq, "The application of least squares," Phil. Mug. 7tli fSer Vol 

11(1931), pp 146-168 ’ ' 

[3] W. Edwards Deming, "On the application of least squares," Phil. Mag, 7th %r, 

Vol. 17(1934), pp. 804-829, 

[4] W. Edwards Deming, Pro c, Phy a. See. Land., Vol. 47(1935), pp, 92-106. 

[6] W, Edwards Dbmino, Some Notes on Least Squares, Graduate Mmol, Dppi, of Agrir. 
Washington, * 



ESTIMATION OF A QUOTIENT 


115 


THE ESTIMATION OF A QUOTIENT WHEN THE DENOMINATOR 
IS NORMALLY DISTRIBUTED 


By Rohkht I). Gordon 

Srripp* Institution of Oceanography, La Jolla, Calif. 

L Introduction. In an oceanographic investigation we have to deal with a 
time Hfric.s rwiM-ting of single pairs of observed values x, y, of two independent 
stochastic \ a rial ties, whose true (mean) values we shall denote respectively by 
a, l>. Of intero-t. is the corresponding time series of quotients ( b/a), which it 
is required to estimate from the observations x, y. Both x and y are approxi¬ 
mately normally distributed about their mean values a, b with rather large 
variant’!*!* <r* , which can he estimated. It is easily possible for x to vanish 
or even to be of opposite sign to «, although a cannot itself vanish. The re¬ 
quired estimates of (b/a) should have the property that they can be numerically 
integrated, i.e. that an arbitrary sum of such estimates shall equal the corre¬ 
sponding estimate of the true sum. 

Let us define a function y(x) to have the property that its mathematical ex¬ 
pectation K\y(x)\ is exactly 1/a, where a =* E(x). If such a function exists 
we shall have 


( 1 ) E\y.y{x)\ « E(y)-E\y(x)} “ b-(l/a) = b/a 

so that j/.-yfx) will be an estimate of b/a which has the required property: 
namely such estimates can be added, and we have 

A’llwOi) 4* Vff(Ji) I - E\m{zd\ + Eimfa)} - 6*/ai + &*/°s 

as required. It turns out that if * is normally distributed with non-zero mean 
such a function y(x) does exist, and is given by the formula 


( 2 ) 


y(a;) 


-- exp (x s /2 <rl) f e 1 n dt -- R*/*, 

<r t Llt m v* 


where R« is the "ratio of the area to the bounding ordinate” which is tabulated 
by J. P. Mills/ also in Pearson's tables. 8 Equation (2) holds if a is positive; if 
a is negative the integration should extend over (x/<r x ,— <*), It is easy to 
verify that 


(3) 


EM.)) - £ tW «<> (- ^f) 


dx 


by direct substitution from (2), 


t j, p. Mills, "Table of ratio: area to bounding ordinate, for any portion of the normal 

curve," Biomelrika, Vol. 18 (1926), pp. 305-400. TT _ . ,. 

1 Karl Pearson, Table* for Statisticians and Bio-metricians, part II, table HI, Uamoriag 

Univ. Press. 



116 


ROBERT T). GORDON 


2. The law of large numbers for y(x). The function 7 ( 4 ") defined by (2) hu« 
mean value , 1 /a as required, but its second moment (hence variance) does not 
exist, as may readily be verified. By a theorem of Khinehine however, its 
values satisfy a law of large numbers. It will be of interest to inquire about the 
“strength” of this law of large numbers for y(x). Namely, given n positive 
number «, how many “observations” (independent estimates) fix) will suffice 
to guarantee probabilities of .50, .90, .95, etc. for the following inequality to 
hold 


( 4 ) I 7fa) + 7fa) + ■ • • + y(?«) _ 11 < e 

( I ’ n a' 

where n is the number of “observations.” 

In order to arrive at a rough answer to this question we have marie use of 
certain inequalities due to Tshebysheff (Tshcbysheff’s “method of moment*”, 
cf. Uspensky 4 ). Let u be an arbitrary stochastic variable whose distribution 
has moments of the first and second order which are known. Denote by rn its 
first moment, by a its variance, then it results from TshebynhefT's theory that 
the probability P(u L , «*) for a value of u to lie between iq and >h (he. ?q guS 
iq) satisfies the inequality 


(5) 


P(.Ui, Ui) > 1 — 


(Ui — m ) 2 + er 2 


s 

or 

(«i — nl ) 2 +• a 2 ’ 


This inequality is independent of the values, or oven the existence, of further 
moments of the «-distribution beyond the second, and doiKsuls only on the 
condition that the cumulant of the distribution function shall have at least three, 
“points of increase." 

Although y(x) does not have a second moment, a second moment does exist 
for those values of y(x) which correspond to x 5; — 0 > — x, where 9 is an 
arbitrary number, positive or negative. If we can estimate the first two mo¬ 
ments of y(x) ~ l/x corresponding to a given value of $, then for a given number 
n of observations we need only to divide the corresponding variance by n to 
obtain a in (5), then multiply (5) by the nth power of the (normal) probability 
for the inequality x ^ — 9, in order to obtain a lower bound for the probability 
of the inequality (4). 6 is to be determined so as to yield a maximum result. 

The first moment wq of y(x) for values of * S* —0 is easily computed, and is 
given by the formula 


(6) 





_ P-thi 


2 J. V Uspensky, Introduction to Mathematical Probability, pp. 195, McGraw-Hill (1937). 
* J V. Uspensky, l.c pp 365 ft. 



ESTIMATION OF A QUOTIENT 


117 


Tht* MToud moment is harder to compute, hut if wc place 


K ■ (r?h — m?) - 


| dx 


(7) 


where 


_ [i* a 0 -) ^ 


l « x p(-- (£ 23-) fe 


.. 1 ( {x - a)‘\ , 1 f 

V 2*r tr« •*-* V 2a* / -\/2ir 

we easily obtain flu* relationship 

(H) $'(<?) 


(04a) /«j 


e -' a/S df 


1 

V' tbrod 


( ifi -j- a) a \ (■,. o'* /1 _ K-*i«x \ 

r - a ■ jy ' 1 -"- 1 ~ t l 1 iur^J. 



From (7), ukuik a table or the probability integral, it can be vonfied that 
— 3ff*) « 0.001. Assume, therefore, as a boundary condition a ~- 
3<r.) • 0 then (8) can be. integrated graphically or numerically. It » by this 
means that the curves shown in Figs. 1 and 2 were determined. Computations 
were also attempted for a/,. - *, of: - 1, but it was not possible to otta 
significant results: it would be necessary in these cases to take more than wo 
moments into account, which would lead to hopeless complications. In t ese 
figures the ordinates represent probabilities for an observation to fall between 
Jba and 1.11a (Figure 1), and between .75a and 1.88a (Figure 2), respectively. 








118 


A. WALD AND J. WOLFOWm 


3. Two practical formulas for computations. It smii*- worthwhile to mile 
here two simple formulas in connection with Mills' ratio (2) which will he useful 
for computations. The first is the obvious relationship 

(9) R-u = V2ne v, ‘ 2 - R m = \/z - It u 


in the notation of Pearson’s tables. The second applies to large values of x, 
and may be written 


( 10 ) 


** + *2 


< y (*) 


1 R*h, < 

Vx 


1 

X 


(10) is true for x > 0, and can be proved by means of the, differential equation 
which y{x) satisfies, 


4. Remarks. The estimate 7(2) has the following inadequacy: If only a single 
observation x is known, then it is unknown whether a is of like or unlike sign 
compared to x. It turns out then that the mathematical expectation for the 
value of y(x) vanishes identically. This difficulty of course disappears if more 
than one observation is available. Methods of avoiding this difficulty for time 
series, e.g, by noting relative frequencies for observations separated by 1, 2, 3 
etc. intervals to agree in sign, will lie discussed elsewhere in connection with 
practical applications. 

It may be worthwhile to note that (Jeary* developed certain elianirleristics 
of the distribution of a quotient, which however are not adapted to our purposes. 


NOTE ON CONFIDENCE LIMITS FOR CONTINUOUS DISTRIBUTION 

FUNCTIONS 

By A. Wald* and J, Wolfowitz 

In a recent paper [1] we discussed the following problem: Let X be a stochastic 
variable with the cumulative distribution function /(x), about which nothing is 
known except that it is continuous. Let , • • • , x„ he n independent, random 
observations on X. The question is to give confidence limits for fix). We 
gave a theoretical solution when the confidence set is a particularly simple and 
important one, a “belt.” 

A particularly simple and expedient way from the practical point of view is 
to construct these belts of uniform thickness ([1], p. 115, equation 50). If the 
appropriate tables, as mentioned in our paper, were available, the construction 
of confidence limits, no matter how large the size of the sample, would be im¬ 
mediate. 

Our formulas (11), (16), (19), (27) and (29) arc not very practical for eomputa- 
tion, particularly when the samples are large. Wc have reeeiUIy learned that 

8 Geary, R. 0., “The Frequency Distribution of a Quotient,” Jour. Roy. Slat. Soc., 
Vol 93 (1930), pp. 442-446. 

’Columbia TTniyersity, New York City. 



CONTINUOUS DISTRIDUTTON FUNCTIONS 


119 


then* exists a result. by Kolmogoroff [2], generalized by Smirnoff [3], \ which for 
large wimplea gives an easy method for constructing tables, i.e, of finding oe 
when r and n arc* given (all notations as in [1]). The result of Kolmogoroff- 
Smirnoff is: 

Let c » X/%/n - Then for any fixed X > 0, 
lim F ~4' m £ = 1 ~ 

W •**'«3 h f-4O 

lim P - 1 - 2 £ (-l) (m “ 1) . 

R**au fn*-l 

This series converges very rapidly. 

REFERENCES 

[1| Walk .vM> Wonrowure, "Confidence limits Tor continuous distribution functions,” 
Annuls oj Malh. Mot., Vol. 10(1939), pp. 105-118. 

(2j A. K«u*acionc»rr, "Hulla dete.rmiuaaione empirics di unn. leggi di diatribuziono,” 
fit or note ddl' Inattlulo llalianadegli Atluari, Vol. 11(1033) 

(3] X. Smi hnokf, Il Kur les ecarta de la courbc do distribution empiriciuc,” Rccaeil Malhe- 
matiqw (Mathrmnticheski Sbornik), New Bcries, Vol. 6(48)(1030), pp. 3-26. 


i in the French rdsuoih of Smirnofl’a article, on pago 26, due to a typographical error 
this formula is given with a factor (-l) m instead of the correct factor (-1)**“ . The 
correct result follows from equation (112), page 23, of the Russian text when £ is set equal 
to isero. 



REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 

The Sixth Annual Mooting of the Institute of Mathematical Statist tea wan 
held at the Stevens Hotel, Chicago, Thursday to Saturday, December 2ti to 2X, 
1040 in conjunction with the meetings of the American Statistical .VwK*iation, 
the Econometric Society, and the American Marketing A'-.*-nr*isitir«i. The fol¬ 
lowing fifty members of the Institute attended the meeting; 

H, K Arnold, C S Bari ell, A G. Brooku, II A' Baines*., V. G Chub, A. C Cohen, Ji . 
W G Corhian, A, T Craig, G G. Craig, B, H. Day, W 1-. Denting, J. h, Umdi, I 1 , S Jhwer, 
Churchill Kiaenliart, J W Fi'rtig, P. G Fox, Hilda Gel ringer, K .1, f ituiilN*], Myrmi Henb 
ingsfield, Iliuold Hotelling, Lea Katz, J. F. Kenney, L F Kinidseri, Min i K"hl, T, Kenji* 
mans, D, H Leavens, Ida, Levin, G A Luudhetg, S, X. Lviile, W i; M.oliw, Italjili 
Mansfield, G. F, T Mayor, ,f. R. Miner, K ('. Malnui, C. tt Mummery,,! t Nuilhnm, 
E G, Olds, P. 8. Olrnsto'ad, A L. O'Toole, ,r A. Pierce, Wilhelm tteif/., I‘ U Rider, M M. 
Sandomirc, B, W, Shaw, W V. Shewhiut, F, F Klepinin, K. A. Kf milter. A. <> Swiunmti, 
S S. Wilks, M 0, Woodbury 

The opening session, on Thursday afternoon, was devoted to contributed 
papers in probability and statistical methodology. The (‘hninnnn was Prnfevair 
S. S. Wilks of Princeton University, and the following papers; were presented: 

1, On the Calculation of thi' Probability Integral on Xnn-Ci nlrnl t and nn A/iji'irofiun. 

0, C Craig, University of Mulligan, 

2 affective Method # of Graduation 

Max Sasuly, Ollirt* of the Actuary, Social Security Hoard. 

3 On Some New Results in the Sampling of Discrete Random Vtirvihlett. 

William G Matlmv, Bureau of the Census. 

4 On the Use of Inverse Probability in Sample Inxprrtiun 

W, Iidwauls Doming and W G Madow, Bureau of the Ccnmia 

5. On a Convergent Iterative Procedure for Adjusting tt Sample Frequency Table when 

Some of the Marginal Totals are Known 

F F. Stephan, Cornell Umveisity, and W. Edwards Doming, Bureau of the Census. 

6 The Return Period of Flood Flows, 

E J. Gumbel, New School for Social Research, New York City, 

7 A Note on the Power of a Sign Test, 

W M. Stewart, University of Wisconsin. 

8. A New Explanation of Non-Normal Dispersion, 

Hilda Geiringer, Bryn Maivr College, 

Abstracts of these papers follow this report. 

On Friday morning a session was hold jointly with tho American Marketing 
Association on The Theory and Application of Representative, Sampling. Under 
the chairmanship of Professor Theodore H. Brown of Harvard University, the 
following papers wore presented: 

1 Background and Method. 

F. F. Stephan, CornallUniversity. 


120 



JUSTOKT OF CHICAGO MEETING 


121 


2. Apphrulwn l-t Mnrkthng Problems. 

Arrhiltalif M tViiwfilf'y, Now York City. 

2 Apphrrthtm i-> Agricultural Problem*. 

AnniW .1 King, Iowa State Cnlli'gi* 

Tl«* aftrrnmm m-.-mou cm Friday wok held jointly with the American Statis¬ 
tical Ammiafion and Econometric Society on The Analysis of Variance, The 
chair was held hy Profemir I’. R. Rider of Washington University and the fol¬ 
lowing papers wen* prcwntwl: 

1 , The Relation Uclteccn the Design of an Experiment and the Analysts of Variance, 

A. K. Brntt<l(, Sail Canwrvatiun Scrvirc. 

2 Tin t'ndrrlying Principles of the Analysis of Variance and Associated Tests of 
Significance. 

(’hurt*hill Kmenlmrl, University of Wisconsin. 

3. The Applications of the Analysis of Variance to A“ bn-Orthogonal Data, 

W. CL Cochran, Iowa Htale College 

Discussion: 

Gertrude M, (“ox, North Carolina State College. 

John F, Kenney, University of Wisconsin Extension Division. 

W. Edwards Denting, Bureau of the Census 

On Saturday morning and afternoon, sessions were held with the American 
Statistical Association on Collection and Use of Statistics for Quality Control in 
National Defense Industries. At the. morning session the following papers were 
given, with Dr. C. W. Gates of the Western Electric Company in the chair: 

1. Report on the Quality Control Program of the American Standards Association. 

John Gaillard, Western Electric Company. 

2. Sample Verification in the Administration of the Population Census. 

W. Edwards Doming, Bureau of the Census, 

3. The Importance of the Statistical Viewpoint in High Production Manufacturing. 

P, L. Algor, General Electric Company. (Read by C. Eisenhart) 

4 . On the Initiation of Statistical Methods for Quality Control in Industry. 

Leslie E. Simon, Aberdeen Proving Ground. 

At the afternoon session the following papers were presented under the chair¬ 
manship of Dr. John Johnston of the United States Steel Corporation: 

1 . The Place of Statistical Analysis in Ferrous Metallurgy. 

E. M. Schrock, Jones and Laughhn Steel Corporation. 

2. Statistical Methods in llie Production and Inspection of Cast Iron Pipe. 

J. T. MacKenzio, American Cast Iron Pipe Company. 

3. Applications of Statistical Methods to Metallurgy. 

II. B, Mears, Aluminum Company of America. 

Discussion: 

Churchill Eisenhart, Umvcisity of Wisconsin, 

The annual business meeting of the Institute was held on Thursday afternoon 
after the session on probability and statistical methodology, with the President 
presiding. 

The Secretary-Treasurer read the financial report for 1940. 



122 


REPORT OK CHICAGO MEETING 


The Editor of the* Annul* of Malhnmtira! Statistics reported on the progress 
of tlu* Annals during 1U10 It was stated that matiUMWipts worthy of publica¬ 
tion were now being submitted at a rate that would justify the publication of 
a 500-page annual volume. To make this amount of publication M*lf-suppui ting 
upon the expiration of the Rockefeller grant in June. 1041. it was pointed out, 
that another 150 new* subscriptions would have to be obtained during 1041, 
Judging from tin* rate at which subscriptions had been coming in during the 
past two years such an increase was considered entirely feasible wit h t he coopera¬ 
tion of the members of the Institute, Various methods of effecting this increase 
were discussed at the meeting and suggested for the consideration of the Hoard 
of Directors. 

On behalf of the Board of Directors the President made the following report: 

1. The Report of the War Preparedness Committee, approved in preliminary 
form at the Hanover meeting, had been preprinted and some of tin* preprints 
had already been distributed. 

2. Arrangements hacl been made with the Executive Uflieet of the National 
Roster of Scientific and Specialized Personnel to semi the statistics cheek list 
to all members of the Institute who are not members of the American Statistical 
Association. 

3. That preprints of the pamphlet on The Teaching of Statistic*, including an 
address by Professor Harold Hotelling, discussion by Dr. W. E. Doming mid 
the resolutions on the teaching of statistics adopted by the Institute at the 
Dartmouth meeting had been produced and distributed. 

4. That application 1 had been made to the Executive (‘ommit tee of the Ameri¬ 
can Association for the Advancement of Science through the Permanent Secre¬ 
tary for admission to the status of an affiliated society in the Association. 

It was announced that through the annual election, carried out, by mail ballot, 
the following officers were elected for 1941 (all names being those proposed by 
the Nominating Committee): 

President: Professor Harold Hotelling 
Vice-Presidents' Professor A. T. Craig 
Professor H, C. Carver 
Secretary-Treasurer. Professor E. G. Olds 

The annual luncheon was held at noon on Friday with the President-Elect 
presiding. Short talks were made by Dr. E. J, Gurabel, Dr. T. Koopmans and 
Professor S. S. Wilks, while the annual luncheon address was delivered by 
Professor P. R, Rider. 

P. R. Rider, 
Sccrclarn-Tnmurer. 

1 Thia application was approved by the Executive Committee of the A.A.A.S. at its 
December 1940 meeting, 



ABSTRACTS OF PAPERS 


(Presented on December 26, 1940, at the Chicago meeting of the Institute) 

On the Calculation of the Probability Integral on Non-Central t and an Appli¬ 
cation. C. C. Craig, University of Michigan. 

It seems not to have been noted that the probability integral for non-central l can be 
calculated by means of an infinite seiies in incomplete /3-functions which converges rapidly 
for small samples The application here considered is to a test based on the randomization 
principle which is the subject of E J. G, Pitman's paper. Significance tesla which may be 
applied to samples from any populations {Roy, Slat Soc Jour , Vol 4 (1037), pp 119-130) 
In case the samples come from normal populations with equal variance but with unequal 
means, the chance, that the hypothesis of equal population means will be accepted on this 
test is given by this probability integral which is evaluated in some illustrative numerical 
examples. 

On Some New Results in the Sampling of Discrete Random Variables. Wil¬ 
liam G. Madow, Bureau of the Census 

Many statistical tables may lie regarded as the result of subsampling finite populations 
classified into i X s X ■■ tables, The main aim of this paper is to derive the associated 
statistical theory including both the finite and limiting distributions. After evaluating 
the fundamental distributions and the moments it, m shown that under certain conditions, 
the limiting distribution is multinomial, while under other conditions the limiting distribu¬ 
tion is mullivauate normal, These results arc then applied to determine the adequate size 
of sample, and the sampling pioporticms from various strata. 

On the Use of Inverse Probability in Sample Inspection. W. Edwards Dbm- 
ing and William G, Madow, Bureau of the Census. 

The theory of inspection by sampling is abstractly equivalent to one part of the theory 
of subsampling. The theory of subsampling finite populations is considered in this paper 
in order to investigate the differences that occur when the methods of fiducial inference and 
inverse probability are used, particularly in regard to determining the adequate size of 
sample. In sample inspection, the prior distribution of failures is almost always known, 
at least approximately. In using any system of sample inspection, a number of failures will 
pass undetected, On tho basis of certain prior distributions of failures, tjUfrlbutions are 
derived for the number and percent of failures remaining after each of several different 
possible systems of sample inspection has boon applied, Formulas giving the cost of partial 
inspection are used together with these distributions in order to determine methods of 
Bample inspection having various desired properties, 

On a Convergent Iterative Procedure for Adjusting a Sample Frequency Table 
When 1 Some of the Marginal Totals are Known. Frederick F. Stephan, 
Cornell University and W, Edwards Demino, Buxeau of the Census. 

The 5 per cent sample taken with the 1940 Population Census presents an interesting 
problem of estimation in which the estimates are connected by equations of condition. 

123 



124 


ABSTRACTS OF PAPERS 


These equations arise from the fact that certain sums of estimates derived from the sample 
should equal the. corresponding Frequencies denvod from the tulmlutioim of the returns 
enumeration, i e the distribution of each of several variables may lie known but their 
joint distribution may only he estimated from a cross tabulation of the data furnished by 
the sample. The adjustment of the sample estimates m accomplished bv I lie principle of 
least squares and an outline of the various types of conditions for two and three variables 
is presented. The solution of the normal and condition equations is tedious when hundreds 
of sets of estimates must be adjusted but a simple iterative procedure is available t«ee 
Annals of Math. Stal., Vol 11 (1940), pp. 427 444) 

The Return Period of Flood Flows. E, J. (1 embed, New School for Social 
Research (N. Y.) 

For any statistical variable the return period is defined as the mean number of trials 
necessary in order that a certain value of the variable or a greater one returns. The return 
period is a theoretical statistical function such as the distribution nr the probability. In 
hydraulics the corresponding observed valucB are the recurrence and exceedance intervals. 

The main thesis is that the flood, flows are Ike largest values of flint's which have to he con¬ 
sidered as unlimited variables. The method of return periods applied to the largest values 
leads without furthci assumptions to a formula which gives the ri turn period/(xl of a flood 
superior to x, and at the same time the moat probable flood to be readied not at a certain 
time, but within a certain period, This formula contains only two eonstmits, which are 
linear functions of the mean annual flood and thu standard deviation. Fuller’s formula 
turns out to be an asymptotic expression of my formula. 

This method applied to the Connecticut, Columbia, Merrimack, Cumberland, Tennenseu 
and Mississippi rivers show's a very good fit- between theory ami observation, superior to 
the methods applied heretofore. 

A Rote on the Power of the Sign Test. W. M. Stewart, University of Wis¬ 
consin. 

Let us consider a set of N non-zero differences, of which x are positive arid .V — x are 
negative, and suppose that the hypothesis tested, tin , implies in independent sampling 
that x will be distributed about an expected value of .Y/2 in accordance with the binomial 
(1 + i) w . As a quick test of Ho , we may choose to test the hypothesis ho that x has the 
above probability distribution. Defining r to be the smaller of x and .V — x, the test con¬ 
sists in rejecting h a and therefore H 0 whenever r < r(«, N), where r(«, .V) is determined by 
N and the significance level <s. 

In applying such a test it is of interest to know how frequently it will lead to a rejection 
of H o when H o is false and the actual situation // implies that the probability law of x is 
(q + p)w with p ^ thereby indicating an expectation of an unequal number of 4- and — 
differences. The probability of rejecting Ho when Hi implying p « p, is true, is termed the 
power of the test of Ho relative to the alternative H i . 

A table is given for the 6% significance level {t » .05) showing the minimum value of 
N for which the power of the test relative to p » p, exceeds 0 for values of 0 from .05 to ,95 
at intervals of 05; and for pt fiom ,60 to ,96 (and thereby for p i from ,40 to .05) at intervals 
of .05. Tlic case of 0 > .99 is also considered for these values of pi 

A New Explanation of Non-Normal Dispersion. Hilda (jKihwoer, Bryn 
Mawr College, 

The starting point of the Leans theory consists in this fact: It is to be expected, on the 
average, that two expressions 2 and 2' which can be computed from the results of m-n 
observations are equal, provided that the corresponding m-n chanco variables are 



ABSTRACTS OF PAPERS 


126 


equally and independently distributed. Let a? be the average a„ 
cragr of the a„ 1, ■ , m). Then 


n 

l/n ^ Xfi, and a the av- 

f-i 


£' 


E 




E E fo* - 3 )’ 

m H , 

jSU * — - ■'—< . . — - ■» 

mn — 1 mn 

El ( a s ~ <*)’ 

m p 

m — 1 m 


We flee, however, tlmfc rows and columns do not play the same role here because £ depends 
only on the «„, the average values of the rows. If the observed value of £ happens to be 
larger (smaller) than the value of 2', we Bpeak of supernormal (subnormal) dispersion. 
It is well known that supernormal dispersion can be explained by assuming that the m n 
theoretical populations are only equal "by rows" but not by columns (there are m different 
distributions); in the same way one ran explain the case of subnormal dispersion by admit¬ 
ting that the distributions are ccpial "by columns," but not by towb. 

Another explanation which may sometimes seem more plausible is the following; All 
the m-n distributions are supposed to be equal, but wo omit the assumption of mutual in¬ 
dependence. Then one can prove that the supernormal or subnormal dispersion corresponds 
respectively to an appropriately defined "positive" or "ncgativo correlation." The fact 
that normal dispersion occurs rather rarely in social questions is then reflected by the idea 
that social phenomena arc. in fact not independent of each other but are usually only as¬ 
sumed so for the purpose of simplicity. In that way the more frequent occurrence of 
supernormal dispersion likewise finds an adequate explanation. 




the annals 

of 

MATHEMATICAL 

STATISTICS 

(vovmno ar H . 0 . oaby»„) r> 

Thk Official Jouiwat 

OF i . 

The cv r Contents 

•-=“ ta «"> 

°n the Djatribution of WfllM’, WARB L -Com>. 127 

<*8^ Group. ^ " 

Sstyar^wta. , a? 

^ ~ R H ' aa 

On the Foundation* of p~u J ‘ Gua «®»&. tRtt 

? ob * bfli ^«»Moa a u re . J.L. , DjoT‘ 8t,tiSUCS ' 1M 

Coaiii 1 ?^ 00 Pr ° b ^“- "“** ^ ^ ^ 

™ ,OT fc Incon *pIefe Beta Function. ‘ ^ 2,5 

Notes: . 

rwas“- «»»—« . ~"■■ r 

Nate on an Am>ii,- ’ .,.. PPlioatioa. Gnera 

m 

™« u rd» 2 


Vol. XU, No. 2 —June, l^j 

















CYCLIC EFFECTS OF LINEAR GRADUATIONS 


129 


that, is to make the graph a perfect zig-zag whcic positive and negative values 
of Zt alternate. A set of differeneings following a set of summings may bring 
the cycle length from some fairly large, number back to about 3, and thus restore 
something like tin 1 original chaotic appearance in the giapli. 

In dealing with the foiegoing<I>(.r) or in (2), it was not assumed that the 
distribution be normal. But, m what follows, it will be assumed that 


( 8 ) 


4>(x) = - - A _ f 

o-(27r) 1/2 


-(r—#j) a /2a 2 . 


and, for eonvenienee, g will lie taken as zero -that is, the data will be supposed 
given as deviations from their theoretic mean. Actually, the data used by 
Slutzky and the data I have used belong to a rectangular distribution, as noted 
in my former paper. Nevertheless the close agreement between actual and ex¬ 
pected results seems to indicate [3, p. 203] that the theory is in general applicable. 
It is well known that, averaging of obsei vaticm.s from non-normal distributions 
may lead rather quickly to an approximately normal distribution. 

Given n real numbers, «i , . , «„, let 


(4) It, rtiT, + fla-r.u a„x H ; i = 1, 2, 3, .... 

Then y, is the moving mm if each a, » 1, Slutzky takes; = i or; = i + n - 1 
Again, y, is the moving avtrugr if each « r = 1/a. For graduation in general, 
the condition iVi, « I is imposed; and usually; = f 4- (n + l)/2 If n is odd, 
yi is thus associated with the middle x, 

Under the assumption thill the x’s are independent and normally distributed 
about mean zero, with constant variance, I have proven [3, p. 256]: The proba¬ 
bility that for any siK-eilied >/,< 0, and y, > 0 is given by P = 0/360°, 
where 


(5) 


cos 6 E Or a r+) 


r*"*! 


Ear 

r-1 


The expected relative frequency of up-erossings of the graph of the y’s through 
the zero base line is then 0/360°. That is: 0/360° is the expected relative fre¬ 
quency of a change in the sign of y from — to +; also, of a change in sign 
from + to —. 

But, as At/, us y nX — y l , it follows that 

(6) Ay, o* hjx, + l^x , f > +.f h„x ,„i -f- /)„i, 

where 

(V) b , * — « t , /<„*, a* . h w » a t ~i "O ri r ** 2, 3, .< ■ , n — 1 

and since a maximum for the yV at y, occurs when Ay,_i > 0, Ay, < 0, it follows 
that the theoretic frequency therefor is 0'/36O°, where 

ft / s+l 

cos is' ~ ^ w>«.+» / E 

/ r**X 


( 8 ) 



®* & Wit®, Bd&tr 

A- T. CRAIG ,j. Ngy%f AN 

• ^ *fiT« t*m» err ’ : 

, it A. KViMfiSj* , , ( 

• * ' ■ ■ J, 2 . n R. T6» 

Bit , 1 g g p 

** ... , ^ h ;£* 

w. A*. S 









CYCLIC EFFECTS OF LINEAR GRADUATIONS 


129 


that is to make the graph a perfect zig-zag where positive and negative values 
of z, alternate. A set of differencings following a set of summings may bring 
the cycle length from some fairly large number back to about 3, and thus restore 
something like the original chaotic appearance in the graph. 

In dealing with the foregoing $(t) or 4>(x) in (2), it was not assumed that the 
distribution be normal. But, in what follows, it will lie 1 assumed that 


(3) 


4>(x) = 




£T(27r) 1 ^ 


and, for convenience, p will be taken as zero—that is, the data will be supposed 
given as deviations from their theoretic mean. Actually, the data used by 
Slutzky and the data I have used belong to a rectangular distribution, as noted 
in ray former paper. Nevertheless the close agreement between actual and ex¬ 
pected results seems to indicate [3, p. 263] that the theory is in general applicable. 
It is well known that averaging of observations from non-normal distributions 
may lead rather quickly to an approximately normal distribution. 

Given n real numbers, cq, a 2 , • • • , a n , let 


(4) y, - aiXi + aix,+i + • • - 4- a„:c, +n _t, i = 1, 2, 3,- 

Then Vi in the moving sum if each a r = 1 . Slutzky takes j — i or j — i n — 1 
Again, y, is the moving average if each o r = 1/n. For graduation in general, 
the. condition 2a r = 1 is imposed; and usually j — i + (n + l)/2. If n is odd, 
y, is thus associated with the middle x. 

Under the assumption that the. x’s arc independent and normally distributed 
about mean zero, with constant variance, I have proven [3, p. 256]: The proba¬ 
bility that for any specified j, y,~i < 0, and y, > 0 is given by P = 0/360°, 
where 

(5) . coa 0 - £ o r o r+ 1 / 2 Or • 

r-1 / r-1 


The expected relative frequency of up-crossings of the graph of the y’ s through 
the zero base line is then 0/360°. That is: 0/300° is the expected relative fre¬ 
quency of a change in the sign of y from — to +; also, of a change in sign 
from 4- to —, 

But, as Ay, » y,+i — Vj, it follows that 
(0) Ayj = biX, + + ■«• 4* 6 n Xi+ n _i 4* , 

where 

(7) /l| ™ “fp , ltn+1 33 Oft , hr — Or—1 ~' Or , f~2, 3, 

and since a maximum for the y’ s at yi occurs when Ay ,-1 > 0, Ayt < 0, it follows 
that the theoretic frequency therefor is 07360°, where 



130 


EDWARD L. DODD 


In a similar manner, by using second differences, we get the expected relative 
frequency 0"/36O° for inflexional points, m specified direction. Moreover, 
0 g 6 ' ^ 0 " <; • • * g 180°; since inflections must be at least as frequent as 
maxima, etc. 

If the foiegoing formulas are applied to the, identical "graduation” y { ~ x, , 
then cos 0 = 0, cos 0' - “1/2, cos 6 " = -2/3. In fact, 

(9) cos 6 l,) = -t/{t + 1). 

This follows from the fact that the b’s and similar coefficients are the binomial 
coefficients; and 

(10) £ t Cl = ztC t ‘, £ tCr • t C r+ 1 - • 

r-0 r>-0 

Thus repeated differencing leads toward the perfect zig-zag. An extension of 
this feature will be taken up in the next section. 

3. Repeated summing and differencing. To indicate the result of the sum¬ 
ming of n consecutive numbers in a sequence, I shall use the notation 1". And 
the difference Ay, = y 1+i — y/ will be indicated by —1, 0 n ~ I , 1. Thus if n = 3, 
l 3 and —1,0®, 1 will stand respectively for 

(11) y x = an_i + z t + ai( +1 ; Ay, ~ -*,„i + 0x ( + ()x, f i H- x l+2 , 

If, now, Zi = y^i + 2 h + lh+i, then 

(12) z, — au-j + 2z<-i -|- 3®i -f- 2x , +l + xt+i . 

Since (n) is often used to indicate the operation of summing w consecutive num¬ 
bers, we may write 

(13) (3) 2 = 1, 2, 3, 2, 1; («) 2 « I, 2, < • • , (» - 1), n, (» - 1), ■ ■ ■ , 2, 1. 

Then, for n > 2, 

(14) A(n) 2 = -1", 1"; A= 1, O”" 1 , -2, 0 n ' 1 , 1. 

And, since the operations of summing and differencing are commutative, we 
are lead to 

( 15 ) f* = (-l)*A*(n)‘ = *C 0 ,0"“\ - k Ci, 0"“\ k C t , 0 n “\ ••• , ; 

as may be established by induction. For from the foregoing, it follows that 

(16) (-l) fc A*(n)* +1 - *C5, - fc C? , •.. , (- 1)\C£ . 

Then, since k+i C T = k C, + *C r _i, we conclude that 

(17) K +1 = (-i) fc+1 (»)* +1 = , o' 1 - 1 , -* + iCT, 0 n -\ ..., . 

If now n ^ 2, then from (5) and (15) we find that 

(18) cos 0 = 0; 0/360° = 1/4. 



CYCLIC EFFECTS OF LINEAR GRADUATIONS 


131 


Thus, the expected frequency of the changes in sign of A^n) 1 is the .same as 
that for the raw or ungraduated data. Moreover, if n S: 3, (8) leads to cos O' 
— —1/2, found for the data For, in this case, at least two zero coefficients 
intervene between any two non-zero coefficients. And thus 

(19) cos 9' = -'EkCl /2 t: kC* = -1/2. 

r*=»o / r>»0 

In fact, the same factor cancels from numerator and denominator as we take 
higher differences, if a sufficient number of zeros intervene. More explicitly 
stated, the formula (9) found for the data is valid also for A k (n) K , provided 
n t -f- 2. 

To make this more concrete, it may bo noted that cycle lengths corresponding 
to t = 0, 1, 2, 3, and 4, are respectively 

(20) 4, 3, 2,73, 2 60, 2.52. 

From (15), we see directly that an element of A k (n) k is correlated only with 
certain other elements which are at distances fiom it which are multiples of n 
Some of the foregoing results may be included m a theorem as follows: 
Theorem - Given a sequence of independent chance variates, each subject to the 
normal distribution (3) with mean zero. Upon this material, let k summings or 
averagings by n be performed and k differencings, in any order. Then the resulting 
sequence has something of the same chaotic nature as the data. In particular for 
n Si 2 the expected frequency of changes of sign is the same, — viz., 1/4 for change 
from minus to plus and 1/4 for change from plus to minus. Moreover, as n is 
increased from 2 to 3, 4, 5, • • ■ , the expected frequency of other characteiistics 
becomes the same, maxima and minima, points of inflection, etc., in accordance 
with (9). 

But, suppose now that after k + 1 summings by n, only k differencings are per¬ 
formed. Is the resulting sequence almost chaotic? Hardly so. At least, it 
can be shown that changes of sign in each direction have no longer an expected 
frequency fixed at 1/4, but this expected frequency decreases as n increases. 
To show this, formula (5) is applied to (16); and setting in (10), C = 2 tCj&, 
C = ih,Ck-i it follows that 

(21) cos 6 = [(n - 1 )C - C']/nC = 1 - (2k + 1 )/n(k + 1). 

Then cos 9 > 1 — 2/n, and the cycle length for expected changes of sign in 
definite direction is somewhat greater than that obtained by setting cos 0 = 
1 — 2/n. For values of n not too small, we may write cos 0=1 — 9 1 / 2, ap¬ 
proximately; and then approximately 

(22) cycle length for definite change of sign in A h (ri) k K is irV n. 

If n — 9, this approximate length is 9.4, assuming k fairly large, whereas the 
more exact length is 9.2, 

Consider now the result of summing k 2 times, and then differencing only k 



132 


EDWAT1D I,. DODD 


times. For this purpose, a few formulas for summing .squares will be useful. 
By the method of differences it can he shown that if l = a + nh, and 

(23) T = a 2 /2 + (a + h)~ + (a + 2 h) 2 (a + n — [h)~ + l 2 / 2, 


then 

(24) T = n(a + al + l 2 )/3 + (l - af/Gn. 

Suppose, now, that a/'n takes on the values 0, * Co , —t(h . * • • , (—T)**Ca in 
succession, while l/n takes on the values *C 0 , — *Ci , • • • , ( — 1 ) k kCk , 0. Let U 
be the sum of the (k + 1) values of T thus obtained. Then by (10), 

(25) U = n\2 nC k - vjCu-i )/3 + » £ *+iC»/0. 

/—0 


(26) 


n 3 (fc+2)(2fc)l n „ 

3 k\(k •+- 1)! +o u+aC * r+1 


Now, by applying to (1(1) one more summation by n, there are foimed (k + 2) 
arithmetic progressions of (n + 1) terms each, alternately increasing and de¬ 
creasing. The maximum and minimum terms at the juncture of the progressions 
are to be split into two halves to apply (23). Then the sum of the squares of 
these coefficients is given by (26), This forms n denominator for (5). 

To obtain the numerator for (5) wo note that from ah = [a s + b 2 ~ (a — b ) 2 ]/2 
it follows that if 


(27) V — a(ci h.) (a + h) (a + 2 h) d- • • • d* (s + n~ 1 h) (a + nh ); 
then, from (23), 

(28) V - T - n/r/3 ~ T ~ (l — of/‘in. 

If now W is the sum of such F’s, reference to the. Inst terms of (24) and (26) 
shows that 


(29) 

And hence, from (5), 

(30) 

Then 

(31) 


W — U (n/3)u+iC'* + t . 


_ (As + 2)n 2 - 4k— 2 
(k + 2)« J + 2k + i * 


cos 0 > 


n 2 - 4, 

71 s + 2’ 


but only slightly greater when k is large. Again 
(32) cos B > 1 - 6/n ! ; 

but only slightly greater when n is not small, In this case, cos 0 = 1 — 0 % /2, 
approximately. And thus, approximately, for large k, and for n not small 

,(33) cycle length for definite change of sign of A*(n) i+a = l,81n. 

This gives for n = 10 a cycle length of 18.1; whereas, if cos 0 is taken as the 
right member of (31), the cycle length is 18.2, 

Thus, if a (k 4- 2)-fold summation or averaging of random data is followed 




THE CYCLIC EFFECTS OF LINEAR GRADUATIONS PERSISTING IN 
THE DIFFERENCES OF THE GRADUATED VALUES 

By Edward L Dodd 
University of Texas 

1. Scope of inquiry. Slutzky [1] applied the moving sum, the repeated 
moving sum, and other linear processes to random numbers obtained from 
lottery drawings But the graph of the moving sum becomes, when the vcitical 
scale is changed in the ratio of n to 1, the graph of the moving average , the simplest 
form of graduation. When cyclic effects are studied, there is no essential differ¬ 
ence between a moving sum and a moving average, nor between a general linear 
process with coefficients m, a 2 , • • • , a,, having sum A ^ 0 and the corre¬ 
sponding graduation, with coefficients a[ = aJA . Thus Slutzky’s work throws 
considerable light upon graduation, although his mam interest was in summation. 

Slutzky found that the graphs of moving sums of random numbers bore 
strong resemblance to graphs of economic phenomena, such as [1, p, 110] that 
of English business cycles from 1855 to 1877. In fact, Slutzky regards the 
fluctuations in economic phenomena as due largely to a synthesizing of random 
causes. 

In general the undulatory character of such values cannot be described as 
periodic; since the waves are of different length. But Slutzky found that, upon 
operating on random data having mean zero and constant variance, the resulting 
values approach a sinusoidal limit under certain conditions,—in particular, when 
a set of n summations by twos is followed by m differencings, and as n —» «, 
m/n —> a constant Romanovsky [2] generalized this result by taking successive 
summations of s consecutive elements of the data, with s ^ 2; but required that 
m/n —> a 1. However, the cases which are of interest to me just now are 
those for which m = n — 1 or m = n — 2; and for these cases m/n —*• 1. Ro¬ 
manovsky considers the case of m = n — 1,—not, however, as leading to a 
sinusoidal limit,—and gives in formula (46) the value of a coefficient of correla¬ 
tion—which I deduce directly From his formula (43) a corresponding coeffi¬ 
cient of correlation can be obtained for the case of m = n — 2, as the sum of 
certain products, A more simple expression than this I need, which I obtain 
directly. In my treatment, these coefficients are the cosines of angles; and the 
ratio of such an angle to a whole revolution is an expected frequency of 
occurrence. 

After setting forth in Section 2 some preliminary formulas, I treat in Section 3 
the results of applying to random data an indefinite number 7c -j- 2 of summa¬ 
tions or averagings, followed by k differencings—the number of terms in a sum 
remaining fixed. In Section 4, however, only a few differencings are applied to a 

127 



128 


EDWAKO L. DOm> 


graduation, In particular the Spencer 21-term formula is studied in some 
detail. In former papers [3, 4] I have dealt with the immediate effects of 
graduations upon random data. 

The question to be considered in this paper is this: Do Uic cyclic effects appear¬ 
ing in the- graduated values persist in the successive differencest And, if so, do 
these affects fade out gradually or on the other hand, do they come to a rather abrupt 
termination? 

These differences of graduated values, indeed, up to the third, Fourth or fifth 
are of considerable importance. Henderson [5] defines the smoothing coefficient 
of a given graduation as the ratio of the theoretical standard deviation of the 
third differences for the graduated values to that for the original values or data. 

2. Preliminary notions and formulas. Tin* data to be graduated will Iks sup¬ 
posed to be independent, or imcorrelated. or as Slutzky expresses it, “inco¬ 
herent.” This will imply that the expected value of the product of two different 
chance variates is the product of their expected values. 

Now the operations of summing and differencing as used hero art' not inverse. 
To illustrate: Given as independent u, v, w, x, y, z, • ■ •. Summing by twos 
yields the sequence it + v, v 4 * u>, w + x, x + y, y 4 - z, • • • . But the first 
differences of these numhers, w — u, x — u, y — to, z — x, - -. are alternately 
correlated, thus w — u is negatively correlated with y — w; r — v with z — x, 
etc. Indeed, successive differencing following successive summing does not lead 
back to the original condition of incoherency. However, under certain condi¬ 
tions, the resulting coherency may be so alight that the final succession of num¬ 
bers may have just about the same chaotic properties as the succession of data. 

In my paper [3, p. 262], I set forth a number of features on the, basis of which 
a cycle length could be defined. One of these involves the frequency of maxima. 
Given independent chance variables, each .subject to the same law' of distri¬ 
bution, 

(1) P{xj £ x) = $(x); 

where $(a) has a derivative <fi(x). It is then easy to see that the expected rela¬ 
tive frequency of maxima is 1/3. That is: 

( 2 ) P(x t ~i £ Xi g Xi + i) — f [<£(3) ]’<*>(*) dz js» 1 / 3 . 

Now, for a given feature, a cycle length, is defined as the reciprocal of the, theoretic 
relative frequency. Then the cycle length here for maxima is three. It is well 
known that averaging tends to remove maxima, Thus, upon averaging or 
summing, the cycle length tends to increase. It is almost as well known that 
differencing tends to increase the frequency of maxima, and thus decrease cycle 
length. For if z, = Ay< = y i+1 — y { , then between two maxima of y<, there is 
at least one minimum (strong and weak) of y,- ; and following this minimum and 
before passing the next maximum of yi there is at least one maximum of z,. Suc¬ 
cessive differencing tends to reduce the cycle length of maxima from 3 to 2 , 



CYCLIC EFFECTS OF LINEAR GRADUATIONS 


133 


by only k differencings, the resulting graduation or linear processing z = A k (n) k ' r2 
is decidedly not as chaotic as the data; as seen from (31) and (33). But further, 
As = A k+1 (n) k+2 ; and thus from (22) the cycle length for the expected maxima 
of s is about ir\/n. 

Now Slutzky [1, p. 109] distinguished conspicuous waves from inconsequential 
“ripples.” On this basis, the frequency of significant cyclical features for a 
chance variable, such as z, would be less than the frequency of the maxima. It 
is not so clear that the frequency of significant featuies of a chance variable 
will be greater than that for changes of sign m definite direction. That turned 
out to be true for graduated values such as discussed in my earlier paper 
[3, p. 262], If this be also valid for z, we would expect that conspicuous “waves” 
of A k (n) k+2 would have average length between iry/n and 1 81n, except for small 
values of n and k 

4. Graduations or linear processes and their successive differences. If double 
summation by n is followed by a single differencing, the result—as indicated in 
(14)—is, for n = 3, 

(34) y , == X\ “h ^*+3 + + :r,‘ + 6» 

Then 

(35) y ]+3 = — x, + 3 — 2.-h ~ ^i'+s + *i+« + r,' + 7 + x {+8 . 

Thus y, and y,- +3 are negatively correlated; since x 1+ a, x, +i , and a:, +6 appear 
in each, but with sign changed. This would seem to tend to make maxima 
alternate with minima at distances of about 3; or at distances of n, in the general 
case (14). Here, following Slutzky and Romanovsky, the coefficient of coi-iela- 
tion r P between elements at a distance of p is taken as 

(36) r p = E{x r -x r+] ,)/E(x r y. 

Using computed averages, instead of expected values, Alter [6] recommends 
a “correlation periodogram,” in which r v is the ordinate for abscissa p. 

Moreover, we would expect a graduation (4) with coefficients a, proportional 
to the ordinates y of the sinusoid y = sin (a -)- 2irx/p ) taken for x ~ 1, 2, 3, • • • 
to impress upon random data oscillations with maxima separated from minima 
by about p/2. But such o;, as well as those in (34), have abrupt en'dings which 
introduce noticeable alterations. More satisfactory results come from tapering 
ends, such as appear in damped vibration, with coefficients about proportional 
to e -0 '* 1 cos 2tx/p or to e -0 '* 1 sin 2 xx/p. H. Labrouste and Mrs. Labroustc [7] 
give a powerful operator of this description. 

Slutzky (loc. cit. pp. 119-123), Yule [8], and Walker [9] make use of damped 
harmonic vibration to explain the creation of cycles; while Bartels [10] ap¬ 
proaches by a different method the oscillations that do not last. 

Now the common graduation formulas have coefficients not conforming strictly 
to damped vibration, as the tapering ends vibrate more quickly. However, 
these ends have little more than a smoothing or stabilizing effect. Furthermore, 



134 


EDWAHD L. DODD 


the coefficients for first differences art* likely to conform to something like 
sin 2 tt x/p. Some experimental evidence will he presented for the following 
conclusion: 

If the coefficients a ( of n graduation, or linear proven* ft) appear to conform 
roughly to equidistant ordinates of a damped vibration, A-e r!;r| cos 2 tt.c//) or 
d=c' c|a:| sin 2%x/p, with changes of sign at intervals of pf 2, then when this process 
(4) is applied to independent chance data having zero mean and constant variance, 
there is a tendency for the, graduated or processed values to change sign at intervals 
of about p/2. 

A number of standard graduations have first and second differences see (0), 
(7)—which bear a decided resemblance to damped vibrations, while the: third or 
fourth differences have only moderate, if any, cyclic appearance. This is espe¬ 
cially true of those graduations which are constructed by applying three sum- 
mings—thc number of terms in a sum being in geneiul different and a fourth 


TAB LIS I 

Coefficients (y.850) for Spencer 21-tcrm graduation and for first four differences . 
Also theoretical cycle, lengths for change in sign in values obtained from 

random data 


, + 6,18,33,47,57,00,57,47,33,18,(5 

^ -1,3,5,5,2 ‘ ~. 2,5,5 ,3,1 

r‘r) + 1,2,2,0 _ 3,10,14,15,12,8,3 

’ ~ 3,8,12,15", 14,'10',3. 0,2,2,1 

2n d D +_ 2,3,5,4, 3 __ _ 3,4,5,3,2 

' -1,1,0 1,4,7,6,7,4,1 ' 0,1,1 

3 rd D +1,0_ 1,1,4,3,3 1_ 2,1,2,1 

' 1,2,1,2, 1 3,3,4",1,1" 0,1 

.up + 1,1,1 10 14 4 10 1 1,1,1 
' -1 133020331 I 


Cycle 

Length 

10.7 

7.0 

5.5 


3.2 


1.6 


process with negative coefficients. This is, indeed, a favorite form of gradua¬ 
tion, with which are associated tho names of Woolhouse, Spencer, Iligham, 
Kenchington, Henderson, etc. The Spencer 21-term formula, for which some 
features have already been described, [3, p, 262], will now be examined, with 
special reference to its differences. Cycle length for change of sign is one-half 
that for change from minus to plus, 

In the graduation formula, itself, there are 11 positive coefficients, centrally 
located, and relatively large as compared with the negative coefficients, This 
11 is close to 10.7 the theoretical cycle length for changes of sign of y r — 4.5, 
the difference between the graduated value y r and its mean—the arithmetic 
mean of 1, 2, ... , 9. The structure of the first and second differences also 



CYCLIC EFFECTS OF LINEAR GRADUATIONS 


135 


matches closely the corresponding cycle lengths. In the third differences; there 
is a break at the center; but still there appears considerable regularity. But 
among fourth differences, the fcigzag is the prominent feature. Now the theorem 
of Section 3 docs not really apply to the Spencer formula, with its two summa¬ 
tions by fives and one summation by sevens, and another process But it is not 
surprising that the cyclicity ceases after passing the third differences. 

As a basis for comparing observed values with expected values, the tenth 
digits in the 600 logarithms from log 200 to log 799 were taken as a random set 
of numbers. These 600 numbers had been given a Spencer 21-term graduation 
[3, pp 261-262], yielding 580 graduated values. From these the 579 first differ¬ 
ences were found, the 578 second differences, etc These numbers, 580, 579, • • • , 
were multiplied respectively by the expected relative frequences of change in 
sign of y r — 4 5, of Ay r , A 2 y r , etc , as found by use of (5), (8), and similar ex¬ 
pressions to form the following table. 

The most abrupt change in frequency or cycle length appears to occur in 
passing from third to fourth differences. In Table I, this is seen in the configuia- 

TABLE II 


Comparison of expected changes of sign with observed changes for a Spencer 21- 

term graduation 



Expected Number of 
Changes from — to + 

Observed Number of 
Changes from — to + 

Graduated values-—4.5 

27.2 

27 

First differences 

41.3 

42 

Second differences. 

52.9 

48 

Third differences 

90.4 

74 

Fourth differences.. . ... 

176.7 

146 


tion of positive and negative terms, and in the drop from 3.2 to 1.6 in cycle 
length; and in Table II m the corresponding increase in expected sign changes 
from 90.4 to 176.7. More spectacular is the increase in the number of zig¬ 
zags represented by —, +, —, +- Among the third differences, there were 
found only 13 instances of four successive terms with signs as just indicated, 
whereas among fourth differences there were found 75 such instances. For 
random material, about 36 such zigzags would be expected--decidedly more than 
found among the third difference, and decidedly less than found among the 
fourth differences. 

The Spencer 21-term graduation appears to be fairly representative of com¬ 
monly used graduations as regards regularity or irregularity in the distribution 
of positive and negative coefficients among the differences. For graduations 
with a much larger number of terms, the alternation of sign in fourth differ¬ 
ences may not be so rapid, as, e g. m the 35-term 5th degree parabolic gradua¬ 
tion which Macaulay [11] calls No. 18 On the other hand, for a formula with 
non-tapering ends, such as the 13-term formula which Macaulay gives [11, 





136 


EDWARD D. DODD 


p. 64], the coefficients appearing in the differences are more irregular, especially 
at the ends. While the Spencer formula is fairly representative, different for¬ 
mulas have distinguishing features. If it is desirable to form an idea of what a 
given formula will do to random data, a table like Table I ran be constructed. 

5. Summary. When upon independent chance data, summing, averaging or 
some more general graduation process is used, the graduated values tend to 
assume a wavy configuration. These waves often seem to have a fair amount 
of regularity or cyclicity. The first differences usually, and often other differ¬ 
ences of the graduated values, are decidedly cyclic But, as we go in turn to 
the higher differences, the cyclicity may weaken. Indeed there may bo a return 
to something like randomness. And subsequent differencings may tend to set 
up zigzags. 

If (k + 2) successive summings by n have been performed on independent 
chance data, with n not too small, say n k 5—then k T 2 differencings will 
just about bring back the original chaotic or random condition. But with only 
k or (k + 1) differencings, a definite cyclicity remains at least theoretically, in 
the expected values. 

In the case of the Spencer 21-term graduation, the coefficients for the suc¬ 
cessive differences indicate the appearance of cyclicity in first, second, and third 
differences. 


REFERENCES 

[1] Euoen Sltjtzky, “The. summation of random causes an tlic source of cyclic processes,” 

Econometnca, Vol. 5 (1937), pp. 105-140. Thin supplements an earlier paper 
(1927) in Russian. 

[2] V. Romanov,sky, "Sur la loi sinusoldale limitc,” Retulieonli del Circolo Maiemalico 

di Palermo, Vol. 56 (1932), pp 82-111. 

[3] Edward I, Dodd, “The length o( the cycles which result from tho graduation of 

chance elements,” Annals of Math Slat., Vol. 10 (1939), pp. 254-264. 

[4] Edward L. Dodd, “The problem of assigning a length to the cycle to be found in a 

simple moving average and in a double moving average of chance data,” 
Econometnca, Vol 0 (1941), pp, 25-37, 

[5] Robert Henderson, Graduation of Morlalxly and. Other Tallies, Actuarial Society of 

America, New York, 1019. 

[61 Dinsmore Alter, “A group or correlation periodogram, with applications to tho 
rainfall of tho British Isles,'' Monthly Weather Review, Vol. 56 (1927), pp. 203-266. 
(71 H. and Mrs. Labroustb, "Harmonic analysis by moans of linear combinations of 
ordinates,” Terr, Mag. and Atmos, Elec., Vol. 41 (1036), pp, 17-18. 

[81 G. Udny Yule, "On a method of investigating periodicities in disturbed series, with 
special reference to Wolfor’s Bunspot numbers/' Phil 'Tram A, Vol, 226 (1027), 
pp. 267-298. 

|9| Sir Gilbert Walker, “On periodicity in series of related terms/’ Roy. Soc, Proc,, 
Sor. A, Vol. 131 (1931), pp. 518-532. 

[10| J Bartels, "Random fluctuations, persistence, and quasi-persistence in geophysical 
and oosmical periodicities,” Terr. Mag. and Atmos. Elec., Vol 40 (1935), pp. 
1-60. 

lll| F R Macatjlay, The Smoothing of Time Series, Publication of the National Bureau 
of Economic Research, No 19, New York, 1931. 



ON THE DISTRIBUTION OF WILKS’ STATISTIC FOR TESTING THE 
INDEPENDENCE OF SEVERAL GROUPS OF VARIATES 

By A. Wald 1 and R. J. Bhooknee 1 

Columbia University 

1. Introduction. We consider p variates Xi, x 2 , • ■ • , x P which have a joint 
normal distribution Let the variates be divided into k groups; group one con¬ 
taining xi , , • • , x Pl , group two containing x Vl+i , x P1+i , • • ■ , x P] , etc. We 

are interested in testing the hypothesis that the set of all population correlation 
coefficients between any two variates which belong to different groups is zero. 

Wilks 2 has derived, by using the Neyman-Pearson likelihood ratio criterion, a 
statistic based on N independent observations on each variate with which one 
may tost this hypothesis Let |] r,, }| be the matrix of sample correlation 
coefficients; Wilks’ statistic, X, is the ratio of the determinant of the p-rowed 
matrix of sample correlations to the product of the pi-rowcd determinant of 
correlations of the variates of group one, the (p 2 — pi)-rowcd determinant of 
correlations of the second group, etc That is 


I I * ! r “»|9j I ■ * ■ I r “*P)fe i 

where | r„ iSi | is the principal minor of j r f , ] corresponding to the t’th group. 

In order to use the test, the distribution function of X must be known. Wilks 
has shown that in certain cases the exact distribution is a simple elementary 
function; m other cases it is an elementary function, but one which is rather 
unwieldy and which docs not lend itself readily to practical use, It is our 
purpose in this paper (1) to show a method by which the exact distribution can 
be explicitly given as an elementary function for a certain class of groupings of 
the vaiiates, and (2) to give an expansion of the exact cumulative distribution 
function in an infinite series which is applicable to any grouping. 

2. The exact distribution of X. By the method to be described, the exact 
distribution of X can bo found when the numbers of variates m the groups are 
such that there are an odd number in at most one group. If the number of 
variates is small, say at most eight, the method will increase only slightly the 
list of distribution functions that Wilks gives in his paper. 

1 Research under a graiiL-in-aul of the Carnegie Corporation of New York, 

2 S. S. Wilks, “On the independence of k sets of normally distributed statistical vari¬ 
ables,” Econometuca, Yol. 3 (1936), pp 309-326. Other references to Wilks in this paper 
except where others lse noted are to this publication 

137 



138 


A. WALD AND 11 J. BROOKNEH 


For purposes of deriving the distribution of X we may assume that E{x u ) = 
0, (u — 1, 2, ••• , p); that there are n = N — 1 independent observations 
x ua (a = 1, 2, ■ ■ • , n) on each variate x„ ; and that the sample covariance 

n 

between X{ and x, is given by s,, = 2 x m x Ia /n. We define «' fa fune.tum of ?/) 

to be the total number of variables in all the groups which precede the, group in 
which Xu lies The complete theory is independent of the ordei ing of the groups 
and of the ordering of the variates within the groups; hence without loss of 
generality, we may assume that if any group contains an odd number of variates, 
it will be the last group, hence u' is always an even integer 

P 

Wilks has shown that X is a product JI where each z u is distributed 

U — JH+l 

independently of the others, and that the distribution of z v is 


(1) 


“ u + 1), u'/2] 


Now let y u = log z u , then the characteristic function of ;/„ is 




l _ f .U»l<«,l(»-«-l)/j _ , J 

wTn /oi L e Zu u Zu) aZu 


B[|(n — u 4- 1), u'/2] Jo 


7 f zi in - u ~ 1)+ \l - Z U ) HU '^ dz u ' 


B[^(n — u + 1), u'/2] Jo 

where t is a pure imaginary. It is known 3 that this integral, even with complex 
exponents, is the Beta-function so long as the real parte of both exponents arc 
greater than minus one, so 


A (a _ - u + 1) + t, v! 12] 

i[«» - u +1), u'm 

_ - u + 1 ) + t-]T[\ (n — u -f- !_+_«')] 

TlKn — u + 1 + u') -f f] - riKn - u + T)] 

But here u‘ is always an even integer, hence by the well known recursion formula 
of the Gamma-function, which is valid for complex arguments excluding only 
negative integers 


4>u(t) = c„{[£(n — u + 1) + f][$(n — u -f 3) -f l] 

• • • [i(tl - u + u‘ ~ 1) -I- t])" 1 

where 


Cu - [l(ri — u + l)][i(n — u + 3)] ■.. [|(n — u + u' — 1)]. 


3 See Whittaker and Watson, A Course in Modem Analysis, Fourth edition 1927, Chap. 12. 



Wilks’ statistic 


139 


Now set 


y — log X - y v 1+ i + y Pl+i + • ■ • + y p 
and the characteristic function of y is 

<K0 = II c„{[|(n - u + 1) + t\[%(n - u + 3) + t] 

- • • [§(n - u + u' — 1) + i]) -1 . 

From the characteristic function, we can obtain the distribution function, 
G(y)> of V by the relation 


g(y) =il f 
V y 2iri J-i 


e~ vt dt 


IL% 1+ i [$( n — u + 1) -(- t] • • • [§(n — u + u' — 1) -j- {] 


where 


P 

c« * XI c„ , 

** ,_ pi+i 

The integration can be carried out by the method of residues; since y is always 
negative (the range of X is fiom 0 to 1), on a half circle with center at the origin 
in the negative half of the complex i-plane, the integral of the function r i?(t) 
converges to zero as the radius of the circle becomes infinite. Since $(<) is 
analytic except for a finite number of poles on the negative real axis, g(y) is c„ 
times the sum of the residues at these points. 

e~ vt 

Now 4>(t) is of the form where P(t) is a polynomial in t as follows: 

suppose that the groups contain r x , • • • , variables respectively, then let 
(ft, + 1) be the number of these r’s which are greater than or equal to j; then 

P(t) = [J(n - 2) + tf l [%(n - 3) + i] fcl [£(n - 4) + - 5) + t) ki+k * 

[%(n - 6) + t] kl+k,+kl ... [J(n — v 4. 1) 4. +*tip]-n(p-»j > 

where 


Then 


[o-/2] = 


<r/2 if u is even 
(a — l)/2 if 0 - is odd. 


g(y; 


P-2 


, ri, f 2 , • ■ • , J-fc) c n £ gj a — l)) s “ + $00L— 


where 


9 a + 1 ft a + fta-2 + ■ ■ • + ft[j(a+2)]— [pa-1)] . 



140 


A. WALD AND R. S. BROOKNER 


It can be shown that is ^ 0 for a between 1 and p — 2. Thus we have 
g(y; r L , n , • ■ • , n) and from it we can calculate /(X; n , r a , • * , n). 

Suppose p = 8 and that the variables are divided mto two groups of four each, 
then we will calculate the distribution function /(X; 4, 4). N?>v 


and 


r v ‘ d£ 


[*(* - 2) 4 - 3) 4 iMn - 4) + £] 2 

. [|(n ~ 5) 4- tY [4(n - 6 ) + t]ft<n - 7) + t] 


- " (wti-MVf-fT-r)' 


Then 
g(y, 4,4) = 16c„£ 

Since 


—e 


l(n-2 )u 

_____ -f. fi Hn~a)v X ___ .f_ 

90 9 9 


no,-.), , c ifn - 7>1 ' ye^ n ~ i)v ye^- 1 

e 4 . ___- 3 — + “§— 


y = log X, dy 


d\ 
X ’ 


we have 


/(X; 4, 4.) = 


16c„ f X Hn “ 4> , X 1<n-rt 8 X ltn " w , 8 X i(n ~ 




30 


+ 


+ 


2 ' 30 

The cumulative distribution function is given by 


X 4 ?L___ _ (x“"- 7) 4 log x . 


J«(4, 4) = Prob [X <, w, 4, 4] 

— I®®? [”_1__ w * _ 4(4n — 23)to 14(4n — 13 ) 10 * 

3 L15(» - 7) n - 6 3(n - 5) 2 H 3(n“ 4)* 

j w 5 id 1 / 2t 0 , 2i0* \ , 1 

+ n - 3 16 (n -2) \n - 6 + n - 4/ ° g * 

Wilks' expression for the cumulative distribution function appears to be quite 
different, but if wc substitute n = N — 1 and use the relation 


r ^ - *)■«* 

= |(n - 2)(n - 3)(w - 4)(n - 6 ) 

[V'"- 5 ’ _ 3i0 4(n ~ 4) 3i0* c " _3) 

Ln— 5 n — 4 n — 3 n — 2J 



WILKS’ STATISTIC 


141 


it can be shown, that the two formulas for the cumulative distribution are 
identical. 

In cases where u' is not always an even integer, the exact distribution func¬ 
tion of X can still be obtained using this method. However, in such a case, the 
gamma functions do not cancel out and the integrand has an infinitude of 
poles, so the function is expressed by an infinite series. We will use a different 
method to obtain an infinite series expansion. 


3. A series expansion of the cumulative distribution function. Let us put 

v = —y, and let the density function of v be h(v), then from (2), we have 


h(v ) dv = dv 


Cn 

2iri 



tt — u -f- 1) + f] di 

u-j-i+i r[-|(n — u H- 1 -j- u')' + t] 


Since v is a monotonic decreasing function of X, and since the critical region for 
testing the null hypothesis is given by the inequality X < Xo, then the critical 
region will be defined by v > v 0 , where t>o is such that 



dv 


is equal to a chosen level of significance. 
Proposition 1. 


h(v) = h n (v)$(v) 

where yp(v) does not depend on n, and h n (v) = c n e~ ,v , 
Proof: Let 


t' = t + i(n - p). 


Then 

h(v) ~ — /’ , °° +,(n P) »U'-lCn-.p)> TT HKp — U + l) + t'] dt' 

2iri J-ia+Kn-p) u r[KP — u + u' + 1) + t'Y 

Now the area in the complex plane bounded by the vertical line through ^(n — p), 
by the vertical line through the origin, and by arcs of a circle with center at the 
origin of arbitrary radius is one in which the integrand is everywhere regular. 
Furthermore, the integral along the arcs approaches zero as the radius of the 
circle approaches infinity, hence the integrals along the vertical line through 
\{n — p) and along the vortical axis are equal. Then we may write 

B —h(v) = — f °° c v{i ' +pli) TT — u + 1) + t']dl' 

C n 27IT J-ia> ij P[§(p — U -f- u' -f- 1) -f- f’] 

= #(«)• 


Therefore 


h(v) = c«e 



142 

A. WALT! AND li. J. HROOKNKK 

Proposition 2. 

, renC-^v'-'di^, 

1 - J2i l I'M ' 

where we define 

r- 2 T. r f 

t***! & 

so that 


r = |[r 2 r 1 rafo + r 2 ) + * • ■ + r k {ri + ra + * • * 4* ff. -i)] 

=» h 52 u 1 

Ur 

• 

Proof: Let 

n * 

2 V==y 

then 


f 

Jo 

c n e -s V -1 dv = f 


- <’* (!)' i,(r) - 

Hence 

I = limc„(-Y 

but 

t t r| (n — w 4- 1 4- w') 
c " V run - u + 1) 

and therefore! 


In 

«. ri(ft-«+ l 4- u’) /2V 12 . 

ri(n-u4i) W 

by an application of the Stirling approximation. Therefore 


/ - rr/- “ i. 

u 

Wc then write 

*> - 



WILKS’ STATISTIC 


143 


hence 


h(v) = 

r(r) 


PitorosiTiON 3. For any 'positive integer s, 


lim l^n ‘■ Prob (v > = 0. 


Proof: Since v = —log X, the inequality v > 1 / -y/n is equivalent to the in- 

p 

equality X < e-«V”. Since X = n , the inequality X < e-uV" implies that 
there exists at least one value of u for which 


z u < e 


1/ (p—pi) \/n 


Hence 


£ P(z„ < e -1,(p - pilVS ) > P(X < e“ 1/V ") == P(v > 1 /V«)- 

Hence in order to prove Proposition 3 we have only to show that for each u and 
any arbitrary positive integer s 

lim {n*-P(s u < e-n<j>-j>i>Vn)} — o. 

fi —*oo 

From (1) we have 


P(z u < 


B[§(n 


_i_ r 

— u + 1); u'/ 2] Jo 


l/Cp—PL> V " 


4 ( ”- UH 15 U -ztf {U '-*dZu. 


Over the range of integration, we have z u < e— i/Cp-p^V" so 


P(z„ < e-l/L—PiWn) < 


e J<n-u-i)/(p-jn5\/" 


B[|-(n — m + 1 ) 


\Zti re ^ 

; r'/2] Jo 

)V" [~ 2 

7^721 L"ip 


(1 - z„)* c '‘'- 2) dz u 


e-Kn^-n/Cp-poVS f 2 „ ,^/ 2 T“ 1/<P ' P,)%Ar 

B[Kn-M + l);«72lL Zu Jo 

2e — lC»-»-iP(p-Pi)V ,n _ . r-^ mo-. 

- — _h — (1 _ e -i/(p-pi)\M" /!! 1 


(1 - 2u) U 


It follows from the Stirling formula that 


lim 12 3 [*(» - u + 1); u'/2] = lim (|T ' 

n—\2 / n—*oo “f“ 1) \2 / 


= r(u'/ 2 ). 



144 


A. WALD AND R. J. BROOKNER 


lira w^ u f ‘e vn/2(p-pi) = o 


lim (1 — (1 — e- 1 ^”)) = 1, 

n —*co 

the proposition follows. 

Proposition 4. The function \p(y) of formula (3) can be expanded in a power 
series, i.e. 

\f(v) = ao atii) + o^u 2 + • • ■ 
with a finite radius of convergence. 

Proof; Wilks 4 has considered the following integral equation: 

f wVM dw = CB' r<k±Mto±JL^LE<k'. +Jt> 

Jo ^ ' r(c + t)'V(c 3 + <)••• r(c 4 + 0' 

where C = , B and g(w) are independent of t, and b, < c, 

(i = 1, 2, ... , q). Wilks has shown that the solution of the integral equation, 
g(w), is given by the following expression: 


o(w) = 


,-i/i _ «y* f * 1 

v &.* — t r • ■ • /ri 1 -*- 1 .!--- 1 • •. 

X [l - *(l ~ - {* + W(1 - *» (l - ■ 

X — {«i + Ps(l — pi) 4- ■ • • 

+ — Pl)(l — Uj) . . . (1 — y # _ a ) } 


X dv i dv 3 ■ • • diu_i 


where 


fc _ TT _rCc ( ) 

fi r(b,)T(^b t ) 


i-i .-I 

T» = 2Z c t—i pi = J2 6,-/ 


,, * S ' S ' Wilks > “Certain generalizations in the analysis of variance,” Biometrika , Vol 24 
(1932), pp 474-5. ’ 



WILKS’ STATISTIC 


145 


the range of w being 0 g to g B, Wilks has furthermore shown that 

(5) {tfi + U 2 (l — Vi) + • • • + 2/,(l — Vi)(l — Vi) ■ ■ ■ (1 — w,_i)} ^1 — < 1 

for w > 0 and 0 g v % 1 (i = 1, 2, -. • , q — 1). 

We denote the left hand side of (5) by £\. The factor (1 — f,) 6 ‘ -c ’ +1 can be 
expanded in a power senes, i.e. 

( 6 ) (1 - f ,) ! ’’- e>+1 = (1 - 

= 1 + (ci+i — b,)f, + |(c l+ i — b,)(ci +1 — b< + l)f? -+■■■■ 
with a radius of convergence equal to ojie Since we will show shortly that for 
the choices we make for the b/s and c,’s, c, +1 > b;, then all coefficients in this 
last expansion are non-negative. Substituting this series expansion (C) in (4), 
and ordering it according to powers of (1 — w/B), the expression under the inte¬ 
gral sign (in 4) becomes 

0o (lb, Vi, • ■ * 

+ 0i (vi, • • • a„_i) ^1 — + 0 2 (di, • • • , w„_i) ^1 — + ■ ■ ■ • 

This series is uniformly convergent over the domain defined by the inequalities 
0 g v, S 1 (i = 1, 2, • • • , q — 1) and | 1 — w/B | < 1. We can even say that 

(7) is uniformly convergent for | 1 — w/B | < 1 if we substitute for each 0< 

the maximum of 0, with respect to , t>j, ■ • ■ , i> a _i. Hence we may integrate 
the series (7) with respect to Vi, v 2 , ■ ■ ■ v,_i term by term, i.e. 

(8) J J o ''' f (?)dvidVi • ■ • dvq-i = cr 0 + iri^l — + cr 2 ^l — + •• • 

and the series (8) is uniformly convergent for | 1 — w/B [ < 1. The coefficients 
<to , <n. , ■ ■ • are non-negative. 

The case of the X statistic which we are considering is a special case of this 
integral equation which we obtain by making the following substitutions: 

w - X, B = 1, u — r + pt, q = p - Pi 

b r = |(n - u + 1), c T = $(n - u + u' + 1), (r = 1, 2, . •. , p - pi) 

Note that then 

C f+1 - b r = M(« + 1)' - 1] ^ 0. 

Hence, according to (4) 

g(X) dX - k.X^-^d - X)*^'- 1 ^ + ffl (l - X) + £7 2 (1 - X 5 ) + • ■ •} dX 

where the infinite scries converges for j 1 — X | <1. 

Now v = —log X, or X — c~”, hence 

h(v) dv = fc.e -1( "- 3 ’ +15 V~ 1 --J 1 {€o + eiV -(- e 2 v 2 + •..} dv 



146 


A. WALD AND R J. BROOKNER 


whore the series {e# + + w" + ■ • •} is obtained from the series (cm -f 

tri(l — X) + • ■ ■) by substituting for (1 — X) the Taylor expansion of (1 — c r °). 
The series (ea + *\v + tiA + • } has a finite radius of convergence. 6 
Hence the function i>(v) can be written as 

\p(v) = 1)0 ^^ {eo d - -f~ £2 n 2 d- ■ • •} 

1 _ e —®\r—1 

—— can be 

, . , « / 
expanded m a Taylor series around v ~ 0, Proposition 4 is proved. 

4. Evaluation of the coefficients in the expansion of \j/(v). Let the scries 
expansion of \[/(v) be 

ip(y) = a 0 + actf + atf? + • > ■ 

Then we have 


where A denotes a constant factor. Then since e i(p 1)v 





c n e~* nv v r ~ 1 


r(r) 


(<*o -(- ctiv d~ ati ) 2 d~ 


...) dv = 1, 


Now let ii* = ~j>, then 


t~( , 2c;,,* , 4«,„*’ . \ 

1. \n)-TW~V° + ~- + ~ + ■■■) 

Suppose that the asymptotic expansion of [ ~ J ~ is given by 

\2/ c n 

ft + ^ + • • •. 


dv* ss 1. 


n n i 


On account of Proposition 3, we have that the asymptotic- -ision in powers 
of 1/n of 


(9) 


fV« e ^v* r V 4«* . \ 

f ““nTT— l a o 4* — v* I- --- v* + ... ) 

Jo r(r) \ n n 2 / 


dv* 


must be equal to the asymptotic expansion of J ■ Since we may integrate 
in (9) term by term for sufficiently large n, we easily obtain 


ri 

“ 0 = ft' «1 * §;> ak = sr- 


0k 


2 fc *r(r -f 1) ... (r + h - 1)’ 


6 ® ee G utzm er, Theone der Eindeutigen Analytiscfien Funktionen, 1908, pp, 91-2. 



wilks’ statistic 


147 


The asymptotic expansion of 



can be calculated in the following mannci 


and 



0o + 


0i 


+ 


02 


n +2 ' (n + 2)‘ 


+ 


0o + 


ft + £? + 

n ^ n 2 ^ 


(’L±_ 2 V Al = (1 + 2/n)' n — 

V n f C n+ 2 u n — 


2A -f" 1 


li + lP + 1 


Equating the right hand members of these last two equations, and taking 
logs, we obtain 


log [ft + Je. + + ■ • • ] - r log (1 + 2/») + 2 log (l - 

- 2io g (i-“ r»'- _ h + to8 (ft + ft + ft + 

u \ n/ \ n n l / 

Then we expand each term in a scries of powers of 1/n and equate coefficients 
of 1/n 1 for each i. Wc obtain the following formulae for the first five 0’s: 

00 = 1 

0i = r + i T, (u - l) 2 - l Z (u - u' - l) 2 

u u 

02 = 01 + ^ — g- + 2 (u — l) 3 — jTj ^ ( U ~ U ' ~~ 

0.i = ~4t@i — 0i — "301 + 0i02 + 202 %r 

+ irr S (w — l) 4 — iJz £ ( w — u ‘ ~~ 1)* 

ti U 

04 = 20! + 20? + 0? + J - 30! 0 2 + 0!01 - 0102 - 402 

+ f + 3 ft “ £ r + w ? (w ” 1)6 " i ? ( “ _ ” 1)6> 


5 Practical use of the series. In practical applications, the value of the 
statistic, say X 0 , is calculated, and it is desired that we determine whether or 
not this value of the statistic falls into the critical region That is, for a partic¬ 
ular grouping of the variates, for a particular number of degrees of freedom, and 
for a chosen level of significance a, there is determined from the distribution of 
X, a value X* such that 


Prob [X < X*] = a, 



148 


A. "VVAJjD AND ll. j. BBOOKNER 


and if A 0 < A* we reject the hypothesis that m the population from which the 
sample is taken all the correlation coefficients between variates in different 
group,? aie zero. 

Since ms a monotonic decreasing function of A we make the test by computing 
Vq == —dog A 0 and we reject the hypothesis if Vo > v* where v* = —log A*. But 
this is equivalent to computing Prob [v > t'o] and if this value is less than a we 
reject the hypothesis Now 


Prob [a > uo3 = /«„(»• i, n, • • ■, r fc ) 
r (r) 


f e ^” D u r ^ 1 (l -j- ait) + asti" -p > • -) dv 

Air, 


Setting — = z 

4 


Prob [a > Ho] = ( 2 Y^ f e~'z r-1 T1 + + a 2 (-)V + 

\n/ r(r) Jnvili L n W 


dz. 


On account of Proposition 3 wc obtain an asymptotic expansion of Prob [a > e 0 ] 
by integrating the right hand member of the above equation term by term. 
This can be expressed by means of the incomplete gamma function, which is 
tabulated 8 in the form 


We obtain 


I{u, p) = 


i; 


u V?+i 


v v e~° dv 


T(p + 1) 


Prob (r > .,1 - (?) c,{[l - 1 ,t~ l)_ 


+ 


The values of the constant K 


r + 1 


-t~ 




c n and the values of ft, ft, ft, ft are 


herein tabulated for any grouping which might be made on six or fewer variates. 
Some cases, such as groupings (1, p — 1), in which case the distribution of A 
is the distribution of the multiple correlation coefficient; and as the groupings 
(2, p — 2), the exact distribution for which was given by Wilks as an incomplete 
Beta-function, are superfluous here. These eases are included only for the sake 
of completeness 


i K, Pearson (Editor), Tables of the Incomplete Guviinu Function, Biometric Laboratory, 
London, 1D22. 



wilks ? statistic 


149 


Table of the First Four /3 J s 


Grouping 

n 

0i 

0= 

03 

03 

2,1 

i 

2 

4 

8 

16 

1,1,1 

1.5 

2.75 

6.28125 

13.38281 

27.57568 

3,1 

1.5 

3.75 

12.03125 

36 91406 

111 55225 

2,2 

2 

5 

19 

65 

211 

2,1,1 

2.5 

5.75 

23.53125 

83.97656 

279.50538 

1,1,1,1 

3 

6.5 

28.625 

106.9375 

366.39844 

i,l 

2 

6 

28 i 

120 

496 

3,2 

3 

9 

1 55 

285 

1351 

3,1,1 

3.5 

9.75 

, 62.53125 

334.10156 

1615.91163 

2,2,1 

4 

U 

77 

439 

2229 

2,1,1,1 

4.5 

11.75 

j 86.03125 

506.16406 

2628.23974 

J,1,1,1,1 

5 

12.5 

| 95 625 

580.6875 

3085.52344 

5,1 

2.5 

8.75 

| 55.78125 

315.82031 

1690.65282 

4,2 

4 

14 

125 

910 

5901 

3,3 

4.5 

15.75 

154.03125 

1205.03906 

8277.55226 

1,1,1 

4.5 

14.75 

136.28125 

1015.50781 

6693.45068 

3,2,1 

5.5 

17.75 

189.53125 

1584.10156 

11445.75538 

2,2,2 

6 

19 

214 

1866 

13947 

3,1,1,1 

6 

18.5 

203.625 

1740.9375 

12797.27344 

2,2,1,1 

6.5 

19.75 

229.03125 

2042.16406 

15530.08351 

2,1,1,1,1 

7 

20.5 

244.625 

2230.1875 

17257.64836 

i,i,i,i,i,i 

7.5 

21.25 

260 78125 

2430.49219 

19139.02892 






150 


A. ' 

WALD AND 

B. J. 

BROOKNEll 





Tables of the Constant 

K = 

\n/ 

jc. 



n 

21 

111 

31 

22 

211 

mi 

41 

311 

10 

.800 

.738 

.646 

.560 

.517 

,477 

.480 

.310 

11 

,818 

.761 

.676 

.595 

.553 

.515 

.521 

.352 

12 

.833 

.780 

.702 

.625 

.585 

.548 

.556 

.390 

13 

.846 

.796 

,724 

.651 

.612 

.576 

.586 

.424 

14 

.857 

.810 

.743 

.674 

.637 

.602 

.612 

.455 

15 

,867 

.822 

.759 

.693 

.658 

.624 

.636 

,482 

16 

.875 

.833 

.774 

.711 

.677 

.645 

.656 

.508 

17 

.882 

.843 

.787 

.727 

.694 

.663 

.675 

.531 

18 

.889 

.851 

.798 

.741 

.709 

.679 

.691 

.552 

19 

.895 

,859 

.808 

.754 

.723 

.694 

.706 

.571 

20 

.900 

.866 

.818 

.765 

.736 

.708 

.720 

.589 

22 

.909 

.878 

,834 

.785 

.758 

.732 

.744 

.620 

24 

.917 

.888 

.847 

,802 

.777 

.752 

.764 

.647 

26 

.923 

.896 

.859 

.817 

.793 

.770 

.781 

.671 

28 

.929 

.903 

.869 

.829 

.807 

.785 

.796 

.691 

30 

.933 

.910 

.877 

.840 

.819 

.798 

,809 

.710 

35 

.943 

.922 

.894 

.862 

.843 

.825 

.835 

,747 

40 

.950 

.932 

.908 

.879 

.862 

.846 

.855 

.776 

45 

.956 

.940 

.918 

.892 

.877 

.862 

,871 

.799 

50 

.960 

.946 

.926 

.902 

.889 

.875 

,883 

.818 

55 

.964 

.950 

,932 

.911 

,899 

.886 

.894 

.833 

60 

.967 

.954 

.938 

.918 

.907 

.895 

,902 

.846 

65 

.969 

.958 

.943 

.924 

.914 

.903 

.910 

.858 

70 

.971 

.961 

.947 

.930 

.920 

.910 

.916 

.867 

80 

.975 

.966 

.953 

.938 

.930 

.921 

,926 

.883 

90 

.978 

.970 

.959 

.945 

.937 

.929 

.934 

.896 

100 

.980 

.973 

.963 

.951 

.943 

.936 

.941 

,906 



WILKS' STATISTIC 


151 


Tables of the Constant K (it) 


n 

221 

2111 

32 

10 

.269 

.248 

.336 

11 

.310 

.288 

.379 

12 

.347 

.325 

.417 

13 

.381 

.359 

.451 

14 

.412 

.390 

.481 

15 

.441 

418 

.508 

16 

.467 

.444 

.533 

17 

.490 

.468 

.556 

18 

.512 

.490 

.576 

19 

.532 

.511 

.595 

20 

.551 

.530 

.612 

22 

.584 

.564 

.642 

24 

.613 

.593 

.668 

26 

.638 

.619 

.691 

28 

.660 

.642 

.711 

30 

.680 

.662 

.728 

35 

.720 

.704 

.764 

40 

.751 

- .737 

.791 

45 

.776 

.763 

.813 

50 

.797 

.785 

.830 

55 

.814 

.803 

.845 

60 

.828 

.818 

.857 

65 

.841 

.831 

.868 

70 

.852 

.842 

.877 

80 

.869 

.861 

.892 

90 

.883 

.876 

.903 

100 

.894 

.888 

.913 


lilli 

51 

42 

33 

.229 

.323 

.168 

.136 

.268 

,369 

.206 

.171 

.304 

.410 

.243 

.205 

.338 

.445 

.277 

.237 

.368 

.478 

.309 

.268 

.397 

.506 

.339 

.297 

.423 

.532 

.367 

.324 

.447 

.555 

.392 

.350 

.470 

.576 

.416 

.374 

.490 

.596 

.438 

.396 

.510 

.613 

.459 

.417 

.544 

.644 

.496 

.455 

.575 

.671 

.529 

.489 

.601 

.694 

.558 

.519 

.625 

.714 

.584 

.546 

.646 

.731 

.607 

.570 

.689 

.767 

.654 

.621 

.723 

.794 

.692 

.661 

.751 

.816 

.722 

.694 

.773 

.833 

.747 

.721 

.792 

.848 

.768 

.743 

.808 

.860 

.786 

.762 

.822 

.870 

.801 

.779 

.833 

.879 

.814 

.793 

.853 

.894 

.836 

.817 

.869 

.905 

.853 

.836 

.881 

.915 

.867 

.852 



152 


A. WALD AND ft. J. BftOOKNER 


Tables of the Constant K (m) 


n 

411 

321 

222 

3111 

2211 

21111 

111111 

10 

.155 

.108 

.094 

.100 

.087 

.080 

.070 

11 

.192 

.140 

.123 

.130 

.114 

.106 

.099 

12 

.228 

.171 

.152 

.160 

.142 

.133 

.125 

13 

,261 

.201 

.180 

.189 

.170 

.160 

.160 

14 

.292 

.230 

.208 

.217 

.197 

.186 

.176 

16 

.322 

.257 

.235 

.244 

.223 

.212 

.201 

16 

.349 

.284 

.261 

.270 

.248 

.236 

.225 

17 

.375 

.309 

.285 

.295 

.272 

.260 

.248 

18 

.398 

.332 

.308 

.318 

.296 

.283 

.271 

19 

.421 

.354 

.330 

.340 

.317 

.304 

.292 

20 

.442 

.375 

.361 

.361 

.338 

.325 

.313 

22 

.479 

.414 

.390 

.400 

.376 

.363 

.351 

24 

.512 

.448 

.424 

.434 

.411 

.398 

.385 

26 

.542 

.479 

.456 

.465 

.442 

.430 

.417 

28 

.568 

,507 

.484 

.493 

.471 

.468 

.446 

30 

.591 

.532 

.510 

.519 

.497 

.484 

.472 

36 

.640 

.585 

.564 

.573 

.552 

.540 

.528 

40 

.679 

.628 

.608 

.616 

.597 

.586 

.574 

45 

.710 

.663 

.644 

.652 

.633 

.623 

.612 

50 

.736 

.692 

.674 

.681 

.664 

.654 

.644 

65 

.758 

.716 

.700 

.706 

.690 

.681 

.671 

60 

.776 

.737 

.722 

.728 

.712 

.704 

.695 

65 

.792 

.765 

.740 

.746 

.732 

.723 

.715 

70 

.805 

.771 

.757 

.762 

,749 

.741 

.733 

80 

.828 

.797 

.784 

.789 

.777 

.770 

.762 

90 

.846 

.818 

.806 

.811 

.800 

.793 

'.786 

100 

.860 

.835 

.824 

.828 

.818 

,812 

.806 



THE MEAN SQUARE SUCCESSIVE DIFFERENCE 


By J. von Neumann, 1 R. H. Kent, H. R. Bellinson and B. I. Hart 

Aberdeen Proving Ground 


1. Introduction. In making measurements, every precaution ib generally 
taken to hold the conditions of the experiment constant, in order that the 
population, whose parameters are to be estimated from the observations, shall 
remain fixed throughout the experiment One wishes each observation to come 
from the same population, or what is the same thing if normality is assumed, 
from populations having the same means and standard deviations. 

There are cases, however, where the standard deviation may be held constant, 
but the mean varies from one observation to the next. If no correction is made 
for such variation of the mean, and the standard deviation is computed from 
the data in the conventional way, then the estimated standard deviation will 
tend to be larger than the true population value. When the variation in the 
mean is gradual, so that a trend (which need not be linear) is shifting the mean 
of the population, a rather simple method of minimizing the effect of the trend 
on dispersion is to estimate standard deviation from differences. It is for this 
purpose that the mean square successive difference 


( 1 ) 


5 J 


n— 1 


12 (*h-i - a*) 2 

i-i 


71—1 


is suggested. The subscript i in this expression refers to the temporal order of 
the observation x x . 

In using S 2 for estimating standard deviation, the distribution of 5 2 in random 
samples is of interest, since questions of bias, efficiency, and confidence interval 
require consideration. 6 2 may be used, in addition, to determine whether a 
trend actually exists; in this case one must know whether <f differs significantly 
from 

(2) , S (aii ~ $)2 


which measures variance independently of the order of the observations, and 
consequently includes the effect of the trend. 


1 Institute for Advanced Study, Princeton, N, J. Also member of Scientific Advisory 
Committee of the Ballistic Research Laboratory, Aberdeen Proving Ground. 

163 



154 


VON NEUMANN, KENT, BELLINSON AND HAKT 


The distribution of S 2 is considered in this paper; it is hoped that others will 
shortly publish methods of estimating the probability that 5" g ks 1 as a function 
of k and the sample size n. 


2. History. A somewhat similar procedure is suggested by “Student” [1] 
and E. S. Pearson [2] who consider the situation in which a shift may occur in 
the mean of the population, but where pairs of observations may be made with 
no shift in mean between them; standard deviation may be estimated from the 
differences between these pairs. The method can be generalized, and 



is an estimate of the standard deviation, n must, of course, be an even integer. 
This estimate has the advantage that its properties are fully known: s' is dis¬ 
tributed as the standard deviation with/ = n/2 degrees of freedom. It will bo 
noted that this estimate does not involve the successive differences, but only 
the alternate ones. Although there are n — 1 available successive differences, 
this estimate uses only the n/2 independent differences. The mean square 
successive difference is based on all n — 1 successive differences, and should 
therefore provide a more efficient estimate of tr than does s'. 

There is, of course, nothing new' in the concept of estimating the standard 
deviation from differences Even as far back as 3870, an interest in the method 
appears to have existed. Jordan [3] devised methods based on sums of powers 
of the differences. Helmcrt [4] gave more careful consideration to the case of 
the first power, i.e. the sum of the absolute differences. In both these cases, 
however, all the n(n — l)/2 differences that can be established from a sample of 
n observations were included in the estimate, so that the estimate was of no 
value in reducing the effect of a trend, Helmert realized this, for he pointed 
out that the estimate obtained from the sum of squares of the differences is 
exactly that obtained by the more conventional procedure of squaring deviations 
from the mean. 

The usefulness of the differences between successive observations only appears 
to have been realized first by ballisticians, who faced the problem of minimizing 
effects due to wind variation, heat and wear in measuring the dispersion of the 
distance traveled by shell. Vallier [5] appears to have been the first to estimate 
dispersion from successive differences. Cranz and Becker [G] commended the 
mean successive difference 


2D ! sq.1 - zi | 

_ 

n — 1 


To establish the precision of Ed in estimating <r, Cranz and Becker quoted 
Helmert's paper, and so erred in saying that their method was superior to that 



SUCCESSIVE DIFFERENCE 


155 


of the mean deviation. Helmert’s procedure, based on n(n — l)/2 differences, 
is indeed more precise (for n > 10) than the mean deviation 

53 I x, — x | 

M.D. = - -, 

n 

but the mean successive difference is based on but n — 1 differences, and so is 
not as precise. 

Bennett [7] appears to have suggested the use of successive differences inde¬ 
pendently of the European ballisticians. In recent years, the method of esti¬ 
mation by the mean square successive difference S 2 was put into practice in the 
Ballistic Research Laboratory at the Aberdeen Proving Ground, U. S. Army, 
by L. S. Dederick 


3. Bias and efficiency. The moments of S 2 in samples drawn from a normal 
population are derived in Section 6 of this paper. The moments are used at 
this point to establish the estimate of variance, and the efficiency of this estimate. 
The mean value of <5 2 in samples taken at random from a noimal population is 

(3) Eif) = 2a 2 . 

5 2 consequently offers an unbiased estimate of variance, and this estimate is 

n —1 

(4) 5 2 _ S ( * ,+1 ~ X ' )2 

2 2(n - 1) ‘ 

The second moment, i.e., the variance, of b l in samples of size n ; 

2 4(3n - 4) 4 

® - Jn~\y ’ ■ 

As the sample size is increased, the distribution of 5 2 appears to approach 
the normal. It is therefore appropriate to consider the efficiency as defined by 
Fisher [8], Accordingly, the efficiency of S 2 is 


Since 


r /“l 2 

a ,2 / a ja 

Ms 1 ) / E(¥)]' 


2 2(rt — 1) 4 

a,a = -«- a , 

ri 1 


E(s 2 ) = 


n — 1 



156 


VON NEUMANN, KENT, ISELLINSON AND HAItT 


the efficiency of 5 1 in estimating the standard deviation is 


( 6 ) 


g (n - 1) _2[~ 1 

3n — 4 3 L 3n — 4_ 


The efficiency as unity for n = 2, since in this ease the two statistics have 
the same distribution. It therefore appears that the efficiency decreases ns the 
sample size increases, but approaches 2/3 as a limiting value for n very large. 


4. Summary of procedure. Having a statistic which estimates a parameter 
of a population, it is desirable to know the distribution of that statistic as com¬ 
puted from samples taken at random froan that population At present, the 
distribution of S 2 in samples of n has not been obtained. The difficulty is in the 
fact that the successive differences are not independent. The first difference, 
di = Xi — xi , and the second difference, d t = x 3 — x 2 , are related in that they 
both involve x 2 , Similar correlation exists between every successive pair of 
differences between successive observations. 

For n = 2, and samples taken from a normal population, the distribution of 
5 2 is known. Since 

a 2 = (x t - *i) 5 = 2 ^ - i)’ = 4s 5 , 

t -1 

the distribution of S 2 is similar to that of s 2 for this sample size. 

For n = 3, the distribution of 6 2 has been derived analytically. The deriva¬ 
tion is indicated in Section 5 of this paper. For n > 3, only the moments of 
the distribution have thus far been obtained. A Pearson type distribution has 
been fitted to the first three moments to obtain an approximate representation 
of the true distribution. 


5. Distribution of 5 2 . In the case of a sample of n taken from a normal popula¬ 
tion, the probability that the first observation lies between x l and xy + dx j, 
while the second lies between Xi and 2 ? -j- dx 2 , etc., is 


(7) 




If y< = x i+ i — Xi , this expression becomes 


( 8 ) 




where Q is a quadratic form in Xi and the y’a. Since 


7»—1 




s 2 = < ” 1 


» — l ’ 



SUCCESSIVE DIFFERENCE 


157 


the probability that 6 2 shall be less than some value So is 

w i> < s ’ < * - [t££\ If-1 • • ■ ■*.- 

2 »*<(»—i)*J 
»*“1 

After the integration with respect to x x is carried out, the quadratic form in 
the exponent may be normalized by a transformation to new coordinates z< 
linearly related to the y’s. The z’s may be so chosen that all the terms z, in 
the exponent have the same coefficient, in which case 


( 10 ) 


rtf' <*-*//.../ 


dz„ 


As a result of such a transformation, the sphere of integration in (9) becomes an 
ellipsoid in (10). By changing to polar coordinates, with 


(ID 



i-i 


P(6 S < So) = df J e~ krl r n -*dQ dr, 


in which ft is the solid angle in the space of n — 1 dimensions The limits of 
integration with respect to ft as a function of r must be found; this involves the 
evaluation of the solid angle subtended by the surface bounded by the inter¬ 
section of the (n — 1)-dimensional sphere and the (n — l)-dimensional ellipsoid. 
If ft = 

(12) P(5 2 < Si) = c 2 [ e~ krl c/>(r)r n ~ 2 dr, 

JO 

in which a is the longest semi-axis of the (n — l)-dimensional ellipsoid cor¬ 
responding to the given value of S . 

For n = 3, (9) becomes 

p(s ' < * - [^k]’ I! £ exp [-» w + v% + v " h) 

(13) - £i (a + ] *> d «' d «' 

“ivb? II 

Normalizing the quadratic form in the exponent, 

(14) // 



158 


VOW NEUMANN, KENT, UELLINSON AND HAHT 


and in polar coordinates 

P(5 2 < 55) = 2’v!™ 5 r f ^ r2[caa2in ^ inieU '^ dO dr 


(15) 


1 . f W V-*/*'*r /*V 

2\/3 xr<r' Jo LJo J 


The integral in brackets can be shown to be a Bessel fund ion of zero order; 
for let 

r 2 /3a* = —2 iu t 



then 

(16) j\ rlBinUlul d6 = e~ lu e ’ u tin *d<p = 27rr- , 'V 0 (u). 
Consequently, (15) takes the form 

(17) P«’ < * = f ^re-'-'-V.(g)* - ?(«!). 

The probability density function 

dF(5 2 ) 


P(5 2 ) = 


d5 2 


(18) 


“ JL /o e 


—2a»/3<ra 


cr 2 "\/3 

1 -2JJ/8<r* 

*V3 r 


T ( ib *\ 

h \z?) 

fl+I' 


A + 


1 

O A ft • 


L 2 2 3V 1 2M 2 3<cr 8 1 2 2 4 2 6 2 3 8 a 12 


+ 




6. Moments. The <-th moment of 5 Z about the origin is defined by 


(19) 
or 

( 20 ) 


= mn 


(n - 1 ) l n' t - e(^ J2 (.Xi+i - k,) z J ^ 


E 


( 2 12 as? - (%l + ®*) - 2 2 «.+i^ ^ 


For any value of f, the expansion can be performed, and similar terms col¬ 
lected and enumerated. The values of x can be considered as true errors, i.e. 
as deviations from the true mean, without affecting the conclusions. If the 



SUCCESSIVE DIFFERENCE 


159 


original population from which the samples have been drawn is normal, with 
standard deviation <r, then 


( 21 ) 


Etf*- 1 ) = 0 

— ( 2 &) ! _2k 

> ~Wk\’ • 


and since, in the null case where the mean of the population remains constant, 
successive observations are independent, then 


E{xW,) = E(x r+ ‘), i=j 
E{x\x)) = i*j. 


These relations are sufficient for the evaluation of . For example, m the 
case of the second moment, t = 2: 

(23) (n — 1) 2 h' 2 = E ^2 Y^x\ - (x? + x 2 ) - 2 £ xt+iXiJ ^ . 

Now: 


|~2 ^Z x ‘ ~ (*i 4" £«) — 2 £ a^iX,”] 

= 4 (^EZ 4 (*i 4 x 2 ) 2 4 4 ^22 x^a:^ 

— 4 (x? + x ! n )l]i!-8Ej:! xmx, 4* 4(x? + x\) JZ xj+ixj 

l-l 1—1 »-l 1—1 

= 4 T 2Z 4 2 x 2 x 2 l + [x* 4 2x*x* 4 xt] 

+ 4^2 J - 4 4 x\ £ xj 4- x 2 n £ x* 4 a<t J 

4 [terms containing odd powers of xj. 


The mean of these terms is found by using (21) and (22), and the number of 
each type of term present is enumerated: 


4[ft(3er 4 ) 4 n(n — l)u 2 cr 2 ] 4 [3<r 4 4 ’ia'o’ 4 3<r 4 ] 4 4[(n — l)<r 2 <r 2 ] 

— 4[3<t* 4 o'" (ft — l)c 2 4 ff 2 (ft ~ I) 0,2 4 3<r 4 ] = (4ft 2 4" 4ft — 12)cr 4 . 


Consequently 

(24) 


i _ 4 (ft 2 4- ft — 3) 4 

'** (« - l) 2 * 


The first four momenta about the origin wqre evaluated by this procedure, 



160 


VON NEUMANN, KENT, BELLINSON AND HART 


and from these, the moments about the mean are readily determined. The 
results are- 

ni = 2<7 2 

4(n 2 + n - 3) 4 


M2 = 


(» - l ) 1 


/ 8 (n 3 4 - 6 n 2 4-2 n- 21 ) e 

^ ~-(iT—"Tj*- a 

16 (n* 4- 14n 3 + 53n 3 - 8 n - 231) 


(25) 


M4 — 


(n - 1 )< 


Mi = 0 

4(3n - 4) 4 

M2 = -—- tr 


(n - 1 y- 
32(5 n - 8 ) , 

w "'5rri5r ff 

48(9n + 46n - 112) 8 
** (n - 1 )* 

It should be noted at this point that the above fourth moment is incorrect 
for n — 2. One of the terms in the expansion of the right aide of (20), for 
t = 4, is 

III™ Y2 • 

l-l 

For n = 2, the mean value of this term is 

E(xlxUlxl) = E(x{)E(x\) = 9a- 8 , 
whereas for n > 2, the mean value is 

E(x[xlx\) 4- E x\ YY + E(x\x? n -\%\) = (ti + 3)cr 8 . 


7. Pearson type fit to distribution of 5*. From the moments it is found that 


(26) 


a _ A _ 16(5n - 8) 1 
Pr 8 "to 7\T > 

Ma (3 n — 4) 

„ _ m< _ 3(9n 2 + 46n - 112) 
P5 & (3n - 4 ) 5 


As n becomes large, /3i and /3a approach 0 and 3 respectively; the distribution 
therefore appears to approach the normal for large samples. For finite sample 
sizes, the values of di and j3a correspond to those of the Pearson Type VI 



SUCCESSIVE DIFFERENCE 


161 


distribution, 

P (^) = c (? + ai ) 0s + ai ) ' 

The origin of this distribution is at 6 2 = — a^ 2 , but the origin of the true dis¬ 
tribution must be at fi : = 0. By taking ai = 0 so that the origin is at S 2 = 0, 
we obtain what appears to be a suitable approximation 


(27) 




The parameters are determined by equating the 1st, 2nd and 3rd moments of 
(27) to the corresponding moments of the true distribution, with the result that 

_ 3» 4 - 10n 8 - 18n 2 + 79ra - 60 
32 8n 3 - 50n + 48 ~ ’ 


(28) 


_ 4 — tijjqi + l)(g 2 + 3) 
4 m 2(92 4~ 1) 

__ 2(gi - q%- 2) 

& + 1 

_ or -91-1 

B(g 2 + 1, qi — q% — 1) 


Values of these parameters for selected values of n are given in Table I. The 
sixth and seventh columns of this table give the values of /? 2 for the distribution 
(27) and for the true distribution, respectively. 


TABLE I 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 






Pi 

Pi 

Ratio 

n 

Si 

32 

0-2 

C 

(27) 

True 

(6)/(7) 

5 

24.4391 

0.6391 

26.6000 

5.8800 X 10 34 

8.807 

8.504 

1.036 

7 

31.1286 

1.3857 

23.2571 

4.9285 X 10 42 

6.948 

6.758 

1.028 

10 

41.2830 

2.5079 

20.9667 

9.4934 X 10 54 

5.658 

5.538 

1 022 

15 

58.2113 

4.3806 

19.2659 

4.0240 X 10 7B 

4.718 

4.645 

1.016 

20 

75.1210 

6.2543 

18.4351 

1.8063 X 10 96 

4.269 

4.217 

1.012 

25 

92.0189 

8.1285 

17.9417 

8.1097 X 10 Ufl 

4.006 

3.965 

1.010 

50 

176 4443 

17.5018 

16.9651 

1.3386 X 10 m 

3.494 

3.475 

1.005 


The Tables of the Incomplete Beta-Function [9] can be used to evaluate the 
probability integral of the distribution (27), 

■>2\ (.«*/«* 


(29) 


i*(3i — 3« — 1, + 1) 


= 1 
x = 


at 


02 + So/* 



162 


VON NEUMANN, KENT, BELLINSON AND HART 


for n g 14. For n > 14, the probability integral may he determined by quad¬ 
rature Some values of the probability integral for n — 50 art; given in Table II. 
A comparison with the integral ol the normal curve having the same first two 
moments indicates that a sample of somewhat more than 50 is required before 
the normal curve becomes a satisfactory approximation to the distribution (27). 


TABLE II 



forn - m 

(29) 

Normal 

.50 

■00000 

•00118 

.75 

.00031 

■00563 

1.00 

.00647 

.02129 

1.25 

.04393 

.06418 

[1] "Student,” Biometrika, 

REFERENCES 

Vol. 19(1927), p. 158. 



[2] E- S. Peakbon, Application of Statistical Methods to Industrial Standardisation and 

Quality Control, London, 1935, p. C2 

[3] W, Jordan, Astronomischc Nachnchten, Vol. 74(1869), pp. 209-226. 

[4] F. R. Helmert, Astronomischc Nachrichten, Vol 88(1876), pp. 127-132. 

[5] E. Valuer, Bahstiquc Experimental, Paris, 1894, p, 166. 

[0] C. Cranz andK, Becker, Exterior Ballistics, (trans. from 2nd German edition) London, 
1921, p. 383 

[7] A A Bennett, unpublished report to the Chief of Ordnance, U. S. Army, circa 1918. 

[8] R A Fisher, Phil. Tram. A, Vol 222(1922), p. 316 

[9] Karl Pearson (Editor), Tables of the Incomplete Beta-Function, London: Biometrika 

Office, 1934. 



THE RETURN PERIOD OF FLOOD FLOWS 
By E. J. Gtjmbel 
New School for Social Research 

Introduction. Engineers have used various interpolation formulas to repre¬ 
sent the observed distribution of flood discharges. These formulas are some¬ 
times constructed ad hoc for a given stream, and have no general meaning. Most 
of them are rather complicated. 1 Some authors have tried to introduce upper 
and lower limits to the discharges, even though it is doubtful that such limits 
exist. Others have introduced the third and fourth moments of the distribution, 
in spite of the fact that these numerical values are subject to large errors. For 
some formulas it is impossible to give a meaning to the constants; different form¬ 
ulas applied to the same stream give rather contradictory results; and conse¬ 
quently there is considerable confusion For example, Slade [20] has stated that 
"the statistical method in whatever form employed is an entirely inadequate 
tool in the determination of flood frequencies.” According to Saville [19] "the 
engineer should satHy himself that he has used an adequate number of methods, 
whether mathematical, graphic or otherwise, which have real support from either 
theory or experience, and then form his own judgement.” 

The main reason for this situation is that these studies have little or no 
theoretical basis. The author believes it possible to give exact solutions, 
exactitude being interpreted from the standpoint of the calculus of probabilities 
[10]. Our solutions are simply the consequences of a truism 1 “The flood dis¬ 
charges are the largest values of the discharges ” The present study is but an 
explanation of this statement. 

Many American authors start with a statistical function, which we call the 
return period of floods. Therefore w'e shall first analyse the notion of return 
period and show how it can be derived as a consequence of the concept of dis¬ 
tribution. We then give a short rdsumd of the theory of largest values The 
discharge, and m consequence the flood discharge, is considered as an unlimited 
statistical variable; it is not necessary to determine its distribution, Wc are 
justified in representing the observed distribution of flows by one of the the¬ 
oretical distributions of largest values. The distribution we choose contains 
only two constants, and both have a clear hydrological meaning. The numeri¬ 
cal values are calculated by the method of moments. 


1 In recent years many articles discussing this topic have been published by the American 
Society of Civil Engineers and the American Geophysical Union [8]. A review of some of 
the proposed formulas is given in the Water Supply Paper 771 [17]. 

103 



164 


E. J. CJUMIJEL 


The application of the notion of return period to the largest values leads to a 
simple formula for the return period of the floods. In the last part of this paper 
we represent the flood flows of the Rh6ne and Mississippi Rivers by our formula. 


1. The return period. Let us consider a continuous statistical variable x, 
having a theoretical distribution w(x). The probability W (x) of a value leas 
than or equal to x, and the probability P(%) of a value greater than or equal to 
x, are 


( 1 ) 


W(x) = l 


X 


oo 


W(z) dZ y 



w(z) dZy 


where z denotes the variable of integration. Clearly 
(R) W(x) + P(x) = 1. 

Let n be the number of observations. Let x m (to = 1, 2, ■ ■ • , n ) be. the 
observed values arranged in increasing magnitude, where m is the serial number 
beginning with the lowest (“from below’”)- The lowest observation has the 
serial number m = 1, the highest has the serial number m — n. These observed 
values will be written , and x„ respectively. The number of observations 
below or equal to x m is m = n’W (x m ) where 'W{x m ) is the observed relative 
number corresponding to the probability W(x). The graphic representation of 
this series is called a cumulative histogram. 

Jn hydraulics many authois arrange the observations in decreasing magnitude 
Let m x {m — 1, 2 , • • • , n) be these observed values. The serial number m is 
counted in a descending scale (“from above”). For the largest value m = 1, 
for the lowest value m = n. The number of observations above or equal to 
m x is m — n’P( m x) where 'P( m x) corresponds to P(x). The numbers f W(x„) 
will never decrease; the number 'P{ n x) will never increase The mth value on 
a descending scale is the n — m + 1th value on an ascending scale. Therefore 

(2) n'P( n x) - n — n'W(x m ) + 1, 

and 


(2') nP(x ) = n — nfF(x). 

The difference between formulas (2) and (2') will play a certain rdle later. 

Different methods are used in statistics in comparing the theoretical values 
W{x) or P(x) and w(x) with the corresponding observations 'W (x m ), or r P( m x] 
(cumulative frequencies) and A'W(x m ) (frequency distribution). They all have 
in common an arrangement of observed values according to magnitude. 

For the purpose of considering the observations in chronological order, we 
introduce a statistical criterion which at first glance may appear to have a new 
logical structure It is assumed here that the observations are made at constant 
time intervals, and this interval is considered the unit of time. We suppose 
that the observations are homogeneous, i e., subject to a common set of forces. 



FLOOD FLOWS 


165 


Furthermore, we suppose that the events are independent of one another, the 
occurrence of a high or low value for x lias no influence on the value of any 
succeeding observation. Let us choose a low value x , and ask the following: 
After what number of observations does this or a greater value return? We 
calculate the mean of these chronological intervals between every two consecu¬ 
tive values, equal to or greater than x We repeat these operations for a second, 
third, . . . till the penultimate value of x. 

These means are called the observed return ■periods. The criterion consists of 
the comparison of the observed, and the theoretical return period for increasing 
values of x. For a discontinuous variable we could obtain the return period for 
a value equal to x, (not equal to or greater than x ). This average time, which 
is sometimes used in physics, does not interest us, as our variable, the discharge, 
is continuous. We limit our consideration to the return period of a value equal 
to or greater than x, called: value greater than x. 

The determination of the theoretical return period is a classical problem: 
How many trials must, on the average, be made, in order that an event of a 
given probability should happen? Our event, the realization of a value, equal 
to or greater than x, has the probability P(x) — 1 — W(x). 

The mean number of trials T(x ) which are necessary to obtain our event once, 
is evidently 


( 3 ) 

or 


T(x) = 


1 

1 - W(x )' ’ 


(30 = 

This value T(x) is the mean chronological interval between two values, equal 
to or greater than x. If we start at the time when such a value has been ob¬ 
served for the first time, we can interpret T(x) as the theoretical return period 
of a value equal to or greater than x. We designate it as the theoretical return 
period. This concept has not been used in statistics. It is a well-known con¬ 
cept in hydraulics which was introduced by Fuller [6]. To every theoretical 
distribution w(x ) there is a corresponding return period T(x) and conversely, 
to every theoretical return period T(x ) there is a corresponding distribution 

(4) »(*) = ^|§> 

obtained by differentiating (3), 

If the variable is without limit to the left, the return period will start with 1 
T = 1. If the variable is limited to the left byig f the corresponding return" 
period will be 


(5) 


T(e) ^ 1 


if W(t) ^ 0 



166 


E. J. QUMBEL 


In the graphic representation, the return period T(x) which 1ms a time dimen¬ 
sion, will be the abcissa and x the oidinate. Therefore we consider x as a func¬ 
tion of T(x); from (4) we obtain 

/ A x dx __1 

w d In T w(z)T(x) 

where In signifies the natural logarithm. The increase of x as a function of 
In T(x ) will be very rapid for small values of T. Tor a limited distribution 
the same result is obtained, provided the probability W («) and the density of 
probability u>(e) are sufficiently small, Clearly, the return periods of the three 
quartiles are respectively 1-|, 2, 4. The return period will always increase 
with x. It will tend towards infinity even if the \ unable is limited to the right. 

Let us now consider the calculus of the observed return periods. Instead of 
values equal to or greater than x m we will only speak of values greater than x m . 
The observed return period is the interval between the iiist and the last observa¬ 
tion greater than x m , divided by the number of intervals between all observa¬ 
tions greater than x m . The number of observations greater than x m is n — 
n'W(x m ). Between these observations there are n — n'W(x m ) — 1 intervals. 
This denominator is independent of the chronological order of the observed 
values. We can calculate the mean of the observed intervals up to a value x m 
so that n — n'W{x m ) = 2 For this value of x m there arc only two observa¬ 
tions, i.e., only one interval. In that case no mean can be calculated. 

The numerator, the interval between the first and the last observation greater 
than x m will be n — 1, provided that the first and the last value in chronological 
order are greater than x m , But in general the first value greater than x„ will 
be the ('k + l)th in chronological order. The first value greater than x m found 
in the reverse chronological order, will be the ( k' + l)th, Let 'k + k' — l ) then 
the interval between the last and the first value gi eater than x m is n — I — l. 
The mean observed interval is thus 

iT{x.m) = (n - 1 - l)/{n - 1 - n'W&m)), 
or 

( 7 ) = 

0 

This magnitude depends only on the chronological order of the first and the 
last value greater than x m . It is independent of the chronological order of all 
other observations. Even in the case l = Othis value differs from the theoretical 
value (3). The observed value surpasses the theoretical value, even if the 
frequency 'TF(x m ) is identical with the probability W(x). 

In the general case, l > 0, this difference is a function of l, The number l 
depends upon the times at which the observations begin and cease; but it is 
not a characteristic of the chronological order. As a result of these disad¬ 
vantages of formula (7) we prefer to introduce other definitions, in which the 



FLOOD FLOWS 


167 


chronological order does not enter. These definitions have an added advantage 
in that they are constructed in a manner analogous to the theoretical formula. 
The observed value which corresponds to (3) is 


( 8 ) 

or 

(9) 


'T(x ) = — 

Km) n~n'W(x m )’ 

'T(x m ) = n/(n - m). 


But this definition of the observed return period is not the only one which 
corresponds to (3), Starting with the serial number m, in a descending scale, 
Fuller [6] puts 


(80 


"TOO 


n 

m 


According to this definition, the return period of the with value from below is 
(90 "T{x m ) = n/(n - m + 1). 


TABLE 1 

Two definitions of the observed return periods 


observed 

serial number 

serial number 

exceedance interval 

recurrence interval 

variable 

from below 

from above 

formula (9) 

formula (9') 

2a 

1 

It 

n/{n — 1) 

1 

X2 

2 

n — 1 

n/(n — 2) 

n/(n — 1) 

Xtji 

m 

n — m + 1 

nf(n — m ) 

7l/(7l — 771 + 1) 

X n -i 

71—1 

2 

n/l 

n/2 

x n 

n 

1 

— 

n/l 

'this observed return period conesponds to the theoretical return period (30- 


The difference between (9) and (90 results from the fact that the relation (2) 
between the observed cumulative frequencies 'W(x m ) and r P( m x) differs from the 
relation (2') between the probabilities W(x) and P(: r) The two definitions 
of the observed return periods are related by 

(10) ”T(x m+ i) = '7\x m ) < 'T{x m+ i). 

From a purely logical standpoint the first definition is as justifiable as tin; 
second one.. Both are used in hydraulics. In order to avoid confusion between 
formulas (9) and (90 Horton [16] calls 'T(x m ) the exceedance interval, i.o., “the 
average interval at which an event of given magnitude is exceeded,” whereas 
ho defines "T(x m ), the recurrence interval as “the average interval of occurrence 
of values equalling or exceeding a given magnitude.” Of course, the exceedance 
interval surpasses the recurrence interval. Since both observed intervals cor¬ 
respond to a common theoretical return period we designate both of them as 
observed return periods. 

The difference between formulas (9) and (90 is made clear in Table I 



168 


E. J. GVMT1EL 


Each of the definitions (9) and (90 and tin; theoretical expression T(x) has 
different properties. For the lowest observation 

n'W(zi) = 1] n'P(„x) = n. 

Thcrefoie 

'n*0 = i + ~ T ; "T(x l ) = l, 
whereas for an unlimited distribution lim T(x) = 1. 

*—90 

If the number of observations is sufficiently large the numerical differences 
between the two observed periods are rather small, except for very large values 
of the variable. For the last observation 

riW(x n ) = n\ n'P(ix ) = 1. 

Therefore the return period ‘T{x „) for the last observation does not exist. Ac¬ 
cording to the second definition the return period for the last value is equal to 
the total number of observations. But in general there is only one observation 
of the last value. 

The preference given formula (9) over (9') corresponds with the preference 
given to W(x) over P(x) when comparing the theoretical with the observed 
values. Therefore it is natural to count m from below. Since both definitions 
are equally applicable and since they lead to different results for large values of 
the variable, one should not calculate the return period for a small number of 
observations. 

The observed return periods (9) and (9') differ from the theoretical return 
period (3) in the same way that the frequencies 'W(x m ) or r P( m x) differ from the 
probabilities W(x) or P{x). The chronological order enters neither into formula 
(7) nor into (9) or (9'). We need not take it into consideration, since the 
theoretical return period is obtained from the probability and the observed 
return period from the cumulative histogram, Therefore the usual statistical 
methods can be used for making the comparison between observed and theoreti¬ 
cal return periods. 

The return period is a statistical function like the distribution, w(x ) or the 
probability W{x). No formula for T{x) that contradicts the properties of iv(z) 
can be accepted. The return period T(x) will contain the same number of inde¬ 
pendent constants as the distribution w(x). Consequently the fit of the theo¬ 
retical curve T(x) to the observations 'T{x m ) or "T(x^) cannot be improved l),v 
introducing a new constant without also changing the distribution The 

theoretical curve x = f(T) will fit the observed curves ( x m , 'V(x m )) and 
{x m , ”T(x m )) in a way that depends upon the fit of W(x) and P(x ) to 'W(x M ) 
and ' P( m x). 

Let us suppose that w(x) contains k constants; that they are determined by the 
method of moments which conserves the arithmetic mean x, the mean of the 
squares x 2 etc. of the observed distribution. For the return period these mo- 



FLOOD FLOWS 


169 


merits have a meaning. Let us consider for the sake of simplicity a positive 
variable. The &th moment M k 


M k — f x k dW(x) 

Jo 

= -[ I fc d(l-lf(x)) 

Jq 


is according to (3) 

(ID 


= k [ (1 - TP (*))** -1 

Jo 


Mi 


= fc/ 
Jo 


* £ fc_1 dx 


T{x) ’ 


dx 


whence for k = 1 and k = 2 


(110 



Etf) = 2 f 

Jo 


xdx 

m 


For a given distribution containing two constants, the method of moments con¬ 
serves the area and the center of gravity of the reciprocal of the return period. 
Even if the method of methods gives the best determination of the constants, 
for the distribution, it need not give the best determination for the return 
period. But if the observed return periods were used for the determination of 
the constants we would get two sets, since theie are two observed curves having 
equal validity, but different values for large x. We will get one and only one 
set if the constants are calculated from the observed distribution, for here the 
difference between 'T(x m ) and "T(x m ) does not matter The fact that we do 
not take the constants from the observed return periods, but from another 
statistical function, might be a cause for deviations between the observed and 
the theoretical return periods. 

Once the constants have been found, we compare the observed curves 
(x m , ‘T{x m )) and ( x m , "T(x m )) with the theoretical curve x — f(T) To avoid 
discontinuity the observed return period will be established for all values of x m 
arranged in increasing order. 

If the observed leturn periods for small values of x are systematically smaller 
(greater) than the theoretical period, it is reasonable to conclude that there 
exists an attraction (repulsion) for small values of the variable and a repulsion 
(attraction) for the large values. But it must be remembered that the observed 
values have different weights in that the return periods for small values of x are 
based on many observations. This number diminishes as x increases. The last 
observed leturn period is based only on two observations Therefore the di¬ 
vergence between theory and observation will increase with the variable. With 
this precaution the criterion of the return period suggests one cause of difference 
between theory and observation. In order to apply this method to the largest 
values we must first establish the corresponding' distribution. 



170 


E. J. OUMIJEL, 


2. Theory of the largest value. Let x he a statistical variable unlimited to 
the right having the distribution w(j ) Among the N observed values, one will 
be larger than the others. We wish to determine its theoretical value. 

According to the principle of multiplication the probability 28 ,v (:r) that N 
values are inferior to x is 

(12) <&*(*) = W w (*). 

This is the probability of x being the largest value. The largest value is a new 
statistical variable which possesses a mode, a mean u, a standard deviation s 
and higher moments. To get the mean the distribution of the largest 

value is needed. From (12) by differentiation 

(13) to*(s) = NW"~ l (x)w(x). 

The mode will be the solution of 


(130 


N — 1 , » , ie'(x) 
W(x) + w{x) 


= 0 . 


Foi a given initial distribution w(x) and for small N we liave to solve tins equa¬ 
tion. But the mean and the moments cannot be obtained in a geneial way by 
the use of the exact distribution (13). However we can reach general solutions 
if N is large, provided we limit ourselves to certain classes of initial distributions. 
We have studied this pioblem in previous publications [11 13). For our present 
purpose it is sufficient to give the. results in a form due to II. von Mines [18). 

We define a large value n of the variable r bv 


(14) N{ 1 - F(m)) = 1. 

This means that the expected number of observations equal to or greater than u 
is one. Equation (14) is but another form of definition (3). The moan number 
of trials is used in (3) whereas the original variable x is used in (14), 

The probability adu that a value greater than u will be contained between u 
and u + du is given by 


(15) 


w(u) 

1 -~W~(u )' 


Obviously a and u are functions of N and the constants in the Initial distri¬ 
bution w(x). There arc two limiting forms of the probability (12) 

lim W N (x) = F(x)-, lim W"(x) - ©(*). 

N — x N-nx> 


If * 

(16) lim a.u — k > 0, 

U—*co 


F(x) = e' ( “ w \ 


we‘obtain 
(17) 



FLOOD FLOWS 


171 


This probability function was first established by Frfichet [5]. If 



we obtain 


(19) SB (a) = 0-' " Cx_u> . 

This probability function is duo to R. A Fisher [4]. Let us consider the first 
limit. The initial distributions which lead to it belong to the Pareto type. 
For this distribution 


w(x) = 


_L • 

X k+i'> 


W(x) = l — \ ) i^l 


and condition (16) holds; for any value of x 


xw (x) 

1 - W(x) ~ 


The distribution f(x ) of the largest value, which corresponds to (17), is 

(20) /(*) = ^(|)* +Ie-<U/I) *- 


The mode x K of the largest value is the solution of 



hence 

k + 1 _ ku k 
x x h+1 ’ 


or 

/ k V lk 

( 21 ) *„.„(__) . 

According to the definition (14) the mode of the largest value will increase 
with N. For a finite number of observations, which is always the case, the 
mode will be limited. But the moments of order k or higher will not exist. 
For k < l, no moment will exist. For k < 2, only the first moment, the mean, 
exists, and so on. 

Let us consider now the second limit (19). The initial distributions which 
lead to it belong to the exponential type. For this distribution [14] 

w(x) = e~ z \ W(x) = 1 — e^; x ^ 0, 

and for any value of x 


d_ f l - W(x) \ = 
dx \ w(x) ) ’ 



172 


E. J. OUMBEL 


which means that condition (18) is fulfilled. Most of the distributions used in 
statistics belong to this type According to (19) the distribution of the largest 
value is 

(22) to (a;) = 

If we introduce a reduced variable y without dimension by the linear trans¬ 
formation 

(23) y = a (x - u), 
we get the reduced probability 33 (y) 

ffi(y) = ffiOc) 

(24) 

= e 

The numerical values of this function, calculated by means of Becker’s tables [1], 
are given in Table II, col. 1 and 2. The reduced distribution 

(25) b(f/) = a-*-'' 1 ', 

makes clear the meaning of u. the distribution has one and only one maximum 
which occurs for the reduced value y = 0. Therefore u is the mode of the 
largest value for a given set of N observations. Tor an initial distribution w(x) 
satisfying (18), and for large N, definition (3) of the return period as a function 
of x becomes identical with relation (14) which involves the number of observa¬ 
tions N and the corresponding most probable value u, 

We wish to decide whicli distribution of the largest value is to be used to 
represent the given observations. This decision depends, according to (1(1) and 
(18), on the nature of the initial distribution at the extreme values of the 
variable. If the law of the observed initial variable is known, a precise answer 
can be given. But generally speaking, a distribution chosen to represent given 
observations is nothing but an interpolation formula. Formulas having different 
analytical properties may all give satisfactory results. One might fulfill condi¬ 
tion (16), and another (18). The conditions apply to the differential coefficient, 
whereas the initial observations are always discontinuous. Therefore they will 
not enable us to decide which, if any, of the conditions is met. For extreme 
values of the variable x the observed differences are large and nonunifomi, and 
there is therefore no way to replace the differentiation by a finite difference, 
Consequently we have to use the observations of the largest values to control 
the two competing theories and not the conditions. The fact that distribution 
(20) has higher moments only under certain conditions, is a strong practical 
argument in favor of distribution (22). Therefore the following development 
will be based on this distribution. 

It can be shown that the mean error 6 of distribution (22) is related to the 
constant a by 

(26) . d = 0.98/a. 

Therefore the constant u is the most probable largest value for N observations 
and 1/a a multiple of the mean error. 



FLOOD FLOWS 


173 


TABLE II 


Probabilities and return periods of largest values 


reduced 

variable 

V 


probability 

©<*) 


return period 
log T(x ) 


Flood discharges per second 

in cubic meter in 1000 cubic feet 
x x 

Rh6ne R. Mississippi It. 


- 2.00 

-1.75 

-1.50 

-1.25 

- 1.00 

-0.75 

-0.50 

-0.26 

0.00 

0.25 

0.60 

0.75 

1.00 

1.25 

1.50 

1.75 

2.00 

2.25 

2.50 

2.75 
3.00 

3.25 

3.50 

3.75 
4.00 

4.25 
4.60 

4.75 
5.00 

5.25 

5.50 

6.75 

6.00 


0.00062 

0.00317 

0.01131 

0.03049 

0.06599 

0.12039 

0.19230 

0.27693 

0.36788 

0.45896 

0.54524 

0.62352 

0.69220 

0.76088 

0.80001 

0.84048 

0.87342 

0.89996 

0.92119 

0.93807 

0.95143 

0.96197 

0.97025 

0.97675 

0.98185 

0.98584 

0.98895 

0.99138 

0.99329 

0.99477 

0.99592 

0.99682 

0.99752 


0.000 
0.001 
0.005 
0.013 
0.030 
0.056 
0.093 
0.141 
0.199 
0.267 
0.342 
0.424 
0.512 
0.604 
0.699 
0.797 
0.899 
1 000 
1.103 
1.208 
1.314 
1.420 
1.527 
1.634 
1.741 


1355 

1492 

1629 

1766 

1903 

2040 

2177 

2314 

2451 

2588 

2725 

2862 

2999 

3136 

3273 

3410 

3547 

3686 

3822 

3959 

4096 

4233 

4370 


803 

869 

936 

1002 

1069 

1135 

1202 

1268 

1335 

1401 

1468 

1534 

1601 

1667 

1734 

1800 

1867 

1933 

2000 

2066 

2133 

2199 

2266 




174 


E. J. GUMBEI, 


TABLE III 

Observed return periods 
Rhon e, Lyon (France) (1826-1936) 


Flood 

discharge 

%tn 

899 

1172 

1231 

1272 

1272 

1432 

1432 

1439 

1444 

1502 

1541 

1560 

1639 

1706 

1780 

1829 

1850 

1857 

1913 

1913 

1934 

1955 

1992 

1992 

2006 

2006 

2013 

2050 

2050 

2072 

2094 

2101 

2115 

2145 

2145 

2153 


Serial 

number 

m 


Return period 
log 'T(*J 

.004 
.008 
.012 
.016 
.020 
.024 
.028 
.032 
.037 
.041 
.045 
.050 
.054 
.058 
.063 
.068 
.072 
.077 
.081 
.086 
.091 
.096 
.101 
.106 
■ 111 
.116 
.121 
.126 
.131 
.137 
.142 
.148 
.153 
.159 
.164 
.170 


Flood 

discharge 

■EfW 

2475 

2475 

2475 

2491 

2514 

2514 

2514 

2514 

2538 

2554 

2580 

2594 

2594 

2594 

2602 

2626 

2627 

2643 

2675 

2675 

2773 

2773 

2773 

2839 

2856 

2881 

2881 

2965 

3007 

3050 

3058 

3067 

3067 

3126 

3179 

3214 


Serial 

number 


Return 
period 
log 'TiXm) 

.313 

.321 

.329 

.338 

.346 

.355 

.364 

.373 

.382 

.392 

.402 

.412 

.422 

.432 

.443 

.454 

.465 

.477 

.489 

.501 

.514 

.527 

.540 

.554 

.568 

.583 

.598 

.614 

.(330 

.647 

.666 

.684 

.703 

.723 

.744 

.766 



FLOOD FLOWS 


175 


TABLE III— Concluded, 


Flood 

discharge 

%/n 

Serial 

number 

m 

Return period 
log 'TiXn) 

Flood 

discharge 

Serial 

number 

m 

Return 
period 
log <T On.) 

2160 

37 

.176 

3250 

93 

.790 

2168 

38 

.182 

3266 

94 

.825 

2175 

39 

.188 

3293 

95 

.841 

2206 

40 

.194 

3310 

96 

.869 

2206 

41 

.200 

3310 

97 

.899 

2206 

42 

.206 

3354 

98 

.931 

2221 

43 

.213 

3426 

99 

.966 

2236 

44 

.219 

3444 

100 

1.004 

2240 

45 

.226 

3444 

101 

1.045 

2258 

46 

.232 

3480 

102 

1.091 

2281 

47 

.239 

3606 

103 

1.142 

2296 

48 

.246 

3625 

104 

1.200 

2327 

49 

.253 

3708 

105 

1.267 

2342 

50 

.260 

3801 

106 

1.346 

2358 

51 

.267 

3810 

107 

1.443 

2381 

52 

.274 

3905 

108 

1.568 

2420 

53 

.282 

4096 

109 

1.744 

2444 

54 

.289 

4105 

110 

2.045 

2452 

55 

.297 

4390 

111 


2467 

56 

.305 





2x m = 276,773. 2x1 = 744,538,565. 


The arithmetic mean u of distribution (22) is [4] 

(27) u = u+-, 

a 

where c = 0.5772157 is Euler’s constant. The standard deviation s is 

(28) s = 7r/a\/6. 

Therefore 

(29) u = u -f 0.45005s. 

' The reduced variable y introduced by (23) is related to the reduced variable 

x — U 

z = - 

s 

* _ f|V§ (, - u) - 

IT T 


(30) 

by 




176 


E. J. GTJMBEL. 


The substitution of the numerical values leads to 
(30') 2 = 0.77970s/ - 0.45005. 

Conversely, 

(31) y = 1.282552 + 0.57722. 

The value (32) v — a/i Z, the coefficient of variation, is related to the product 
cm. By (27) au = oru — c and by (28) 

(33) cM = ^=.^-c. 

Therefore the numerical value of otu can also be considered as a characteristic 
of an observed distribution of largest values. 

For the two constants we calculate for the observed distribution of largest 
values the two first moments 

(34) U — - 2 x n , 

“ft m—1 

and 


(35) = l i:^. 

ft m—1 

To get the observed standard deviation we use the Gaussian formula 

(36) *-4/(1 + 


*- 1 /( 

According to (28) and (27) 

(37) 
and 


n 




(w 2 - if). 


~ = 0.7796968a, 

a 


(38) 


W = G — 


0.577215*7 


These formulas give the two constants in the distribution of largest values. 


3. Flood flows interpreted as largest values. We will now apply the theory 
of largest values to flood flows. Let us consider the daily flow as a statistical 
variable, unlimited to the right. This idea is not new. The formulas proposed 
by Fuller [7], Hazen [15], and numerous other authors all incorporate this 
assumption. Gibrat [9] supposes that the daily flows vary according to Galton’s 
distribution. Instead of postulating a specific formula for the distribution of 
■ flows we shall only suppose that it belongs to the usual exponential type, which 
means that condition (18) is fulfilled. 

We define a flood as being the largest value of the N = 366 daily flows. The 



FLOOD FLOWS 


177 


flood flows are therefore the largest values of flows. This commonplace implies 
the distinction, between floods and inundations. For each year there exists one 
or more floods of the same magnitude, but there might exist several different 
inundations or none at all. If there are several inundations in a year the 
greatest one will be a flood; but a flood need not to be an inundation even a 
dry year has a flood. We limit ourselves to floods, assume that N = 365 is a 
large number, and represent the distribution of annual floods by the distribution 
(22) of largest values, 

There have been objections to the concept that the daily flow is an unlimited 
variable. Horton [16] believes that this implies the absurd idea of unlimited 
floods. This opinion is shared by Slade [20], who claims that there is a definite 
upper limit to the magnitude of the floods for a given stream. The theory of 
largest values confirms only partially Horton’s opinion. If we should choose 
distribution (20), the most probable annual flood will be limited For this 
distribution, however, it might happen that the mean annual flood has no 
meaning. To avoid this we have chosen distribution (22), for which the mean 
annual flood and all the moments will be finite. A further justification of the 
use of (22) might be derived from the fact that Galton’s distribution belongs to 
the exponential type. As a final argument, numerical calculations show that 
formula (22) gives a better fit to the observed distributions of flows. 

The variable x is the annual flood flow measured in cubic meters or cubic 
feet per second. The mean ft is the annual mean flood, whereas u is the most 
probable annual flood. The value s is the standard deviation of the distribu¬ 
tion of annual floods. Finally y is called the reduced flood. 

The distribution (22) 'possesses the properties of the observed distribution of 
flood flows. It is asymmetrical; rising rather quickly but falling rather slowly. 
The modal value is to the left of the mean (see Fig 3), 

To apply the theory of return periods let us consider the event of the highest 
annual discharge being greater than x. We have to replace in formula (3) the 
general probability W(x) by the probability of flood discharges (19). The 
number of observations n is the number of years for which observations exist. 

To use formula (3) we have to suppose that the intervals between the suc¬ 
cessive floods are all equal to one year. This assumption conforms more or less 
to the seasonal nature of floods. 

The return period of a flood greater than x 

(39) T(x) = — 

is the arithmetic mean of the intervals between two years, which have a flood 
discharge greater than x; the discharges for the intervening years arc all less 
than x. Therefore T(x) is the mean of the number of years for which x will be 
surpassed once Formula (39) gives the meaning of u fiom the standpoint of 
the return period. For y = 0 

e 


T(u) = 



178 


E. J, GUMIIETj 


The return period T(u) of the most probable annual flood is 1.58198 years. In 
other words, the constant u is the flood discharge with return period 

(40) log T(u) = 0.19920 

where log signifies the common logarithm. The return period of tlu* mean 
annual flood is by (27) and (39) equal to 2.32762 years. 

Let us now consider the relation between the flood discharge a: and its return 
period for small and large values of x. To small values of x correspond large 
negative values of y and therefore return periods T approximating 1. The 
distribution (25) of the largest values being unlimited, the flood discharge con¬ 
sidered as a function of log T will by (6) increase rapidly at first. To large 
values of x correspond large values of y and T(x). If we introduce the natural 
logarithm, (39) gives 

“ ln ( 1 “ Wy) = 6 * 

For large values of x, viz., T(x) 2: 10, it is sufficiently accurate to use 


so that 


1 

T (x) 


(41) y — \n T(x). 

If the common logarithm is used, 

(42) log T(aj) = 0.434294a(a: - u). 

The logarithm of the mean number of years for which the flood discharge will 
once be exceeded, converges towards a linear function of x. Tliis property of 
the distribution of largest values was established by M. Coutagne [2], Let us 
write 


(43) 


. 2.30268 . , 

x = u -f- -log T(x). 


a 


Then 1/a can be considered as a measure of the increase of a flood discharge 
with respect to the logarithm of time 

Accojding to the general formulas (6) and (42) the shape of the return period 
as a function of the flood discharge x is as follows; at the beginning i.e., for small 
flood discharge, the return periods are close to 1 and increase very slowly. At 
the end, i.e., for large flood discharges, the logarithm of the return period con¬ 
verges to a linear function of x. 

Another form of (43) is 


(44) 



2.30258 

au 


log T(x). 


The ratio of the flood discharge which will be exceeded in the mean once in T 
years to the modal annual flood converges to a linear function of the logarithm 



FLOOD FLOWS 


179 


of the return period. The constant 1/au of dimension zeio depends, by (33), 
on the coefficient of variation. Its value is a characteristic of the stream. If 
we introduce the arithmetic mean u and the standard deviation s we obtain 
by (42), (27), and (28) 

* = u - 0.45005s + (0 77970) (2.30258)s log T(x). 

Therefore, approximately, 

(45) * - 1 - ~v + 1.7962; log T(x). 

u 20 

The right hand member of this linear equation contains only one constant, the 
coefficient of variation of the floods Finally by (42) and (31) 

(46) log T(x) = 0.25068 + 0.55700^-^. 


There is still another way of interpreting these asymptotic formulas. Let 
T(2x) be the return period of the value 2x, then by (43) 


therefore 


and finally 

(47) 


2 x — u + 


In T(2x) 

CL 


au + In T(2x) 
cm + In T(x ) ’ 


T(2x) = T\x)e au . 


The return period of a flood of magnitude 2x is equal to the square of the 
return period of x multiplied by a factor which depends only upon the coefficient 
of variation. 

All these asymptotic formulas are good appi oximations only for return perio Is 
above ten years, which means according to Table II, y ig 2.25 or according 
to (23), (30) and (31) x ^ U + 1.3s. The corresponding value of the flood 
probability is by (3) S33(a;) S; 0.9. The consequences of (41) can be applied to 
only 10% of the observations, i.e. to the large flood discharges. Their observed 
return periods arc based on a few observations and may therefore differ con¬ 
siderably from the theoietical values. In spite of the above restrictions the 
linear formula (43) has a meaning for values of T equal to or greater than unity. 
We now ask: How will the most probable largest value increase with the number 
of observations? This number of years can again be called T. The answer to 
the above question requires the solution of (13') where the distribution (25) of 
largest values S ){y) must bo introduced as the initial distribution w(x). 

From (24) 

e~ v ~‘~ u - 1 + e~ v = 0, 



180 


E. J. GTJMBEL 


or 

To-* = 1, 


which is identical with (41). For T = 1 the most probable annual flood is of 
course u. Therefore the relation (41), valid for T ^ 1, means: The most prob¬ 
able flood u(T) to be reached within T years is a linear function of the logarithm 
of T 


(410 


«(D = u + 

a. 


The constant 1/a is the slope of this straight line. The results (41-46) are 
related to Fuller’s well-known formula [6], This author, the first to investigate 
flood flows systematically, proposed a linear relation between the logarithm of 
the return period and the arithmetic mean of the flood discharges greater than 
the mth value (m taken from above). A similar empirical formula has been 
stated by Lane [7] and has been applied by Saville [19]. The similarities and 
differences between these interpolation formulas and our theory can be stated 
in the following way: If we start from the theory of largest values we reach 
these formulas as asymptotic expressions for the return period of largo floods. 
Considered this way, our theory gives a certain justification to Fuller’s hypothe¬ 
sis. But Fuller’s and similar formulas were intended to apply to all flood 
discharges. Now, the distribution of the flood discharges (4) corresponding to 
these return periods does not fit the observations. It can be shown that these 
formulas involve the assumption of a simple exponential distribution <p(x) for 
the flood discharges 

(48) v(x) = o. 

U — € 


and the existence of a lower limit t of the flood discharges given by e = u — a. 
In Fuller’s formula all flood discharges must be greater than 2/3 of the mean 
annual flood. The density of probability always diminishes with increasing 
magnitude of the flood This neglects the ascending branch (about one third) 
of the distribution of floods (see Fig. 3) and is incompatible with the observed 
facts. We therefore prefer our formula which takes account of the total varia¬ 
tion, but we do not minimize the importance of Fuller’s work which has led to 
much valuable research 

Formula (39) gives the theoretical return periods T(x) as a function of the 
reduced flood discharge y, and holds for the entire range of observations. The 
general numerical values are given in Table II, cols. 1 and 3. For a given stream, 
the return period of a flood discharge greater than x depends by (23) upon tbe 
two constants a and it. If these values have been calculated by (37) and (38) 
the theoretical flood discharge x corresponding to T(x) is obtained by the 
linear transformation 


(49) 


x = u y/ot. 



FLOOD FLOWS 


181 


The asymptotic formula (42) suggests the coordination of the flood discharges 
x and the logarithm of the return periods. 

4. Rhone and Mississippi Rivers. We think that our system of formulas is 
simple, logically consistent and free of artificial assumptions. Now it remains 
to be shown that the arithmetic involved is simple and that the results fit the 
observations For the Rh6ne we shall analyze the observed cumulative fre¬ 
quency, the distribution, and the return periods. For the Mississippi River 
we shall limit ourselves to the return periods. 

For each year we choose the maximum of the daily discharges (we do not use 
momentary peaks). The 111 values x n for the Rhone 1826-1936 published by 
Coutagne [3] and arranged in order of increasing magnitude are given in Table III 
(col. 1). The supposition that the intervals between consecutive floods are all 
equal to one year is not always true. Only 77 of the 111 floods occurred between 
October and March, whereas 34 were scattered throughout the year. But the 


TABLE IV 
Calculation of constants 


Stream observation station. 


Rhdne Lyon 

Mississippi River 



(France) 

Vicksburg (Miss.) 



1826-1936 

1890-1939 

Number of observations .. 

. n 

111 

50 

Annual mean flood . 

.u 

2,493.5 

1,355.6 

Mean squared flood 

u 2 

6,707,555.0 

1,951,828.8 

Standard deviation .. 

, s 

703.1 

341 .3 

Constant ... 

. . 1/a 

548.2 

266.1 

Most probable annual flood 

u 

2,177.0 

1,201 9 


differences in the lengths of the intervals compensate each other. The second 
column of Table III contains the serial number m According to (9) w? calcu¬ 
late for the mth observed flood discharge x m , taken in ascending magnitude, 
the logarithm of the observed return period log n/(n — m ) (col. 3), wheie n = 111 
and ?» = 1, 2, • • - , 110, and obtain the exceedance intervals. The other 
observed curve, the recurrence interval, is obtained by (10) through the coor¬ 
dination of a; m+ i and log n/(n — m). Both curves are plotted in Fig. 1. The 
recurrence and exceedance intervals differ for the large flood discharges. The 
observed flood discharges arranged in increasing magnitude are plotted in the 
cumulative histogram, Fig. 2. 

To compare these observations with our theory, we calculate the two con¬ 
stants 1/a and u according to the formulas (34)-(38). The values and 
are given at the end of Table III. Division by n ~ 111 gives the mean 
flood u and the mean squared flood u (Table IV), The Gaussian correction 
being 1 + 1/110 we obtain from formula (36) the standard deviation s (Table IV) 




TABLE V 


Observed and theoretical distributions of flood discharges 

Rh6ne 


Reduced 

variable 

V 

Variable 

X 

Midpoints 

Ax 

x+- 

Observed 

distribution 

lllA'ffiHa:) 

Theoretical 

distribution 

lllA®S(s;) 

Cumulative 

frequency 

lliaB(sr) 

-2.75 

670 





-2.50 


807 

1 


0.00 

-2.25 

944 



0.01 

0.01 

-2.00 


1081 

1 

0.34 

0.07 

-1.75 

1218 



1.19 

0.35 

-1.50 


1355 

7 

3.03 

1.26 

-1.25 

1492 



6.07 

3.38 

-1.00 


1629 

5 

9.98 

7.33 

-0.75 

1766 



14.02 

13.36 

-0.50 


1903 

13 

17.38 

21.35 

-0.25 

2040 



19.49 

30.74 

0.00 


2177 

21 

20.21 

40.84 

0.25 

2314 



19.68 

50.95 

0.50 


2451 

19 

18.26 

60.52 

0.75 

2588 



16.31 

69.21 

1.00 


2725 

14 

14.14 

76.83 

1.25 

2862 



11.97 

83,35 

1.50 


2999 

9 

9.94 

88.80 

1.75 

3136 



8.15 

93.29 

2.00 


3273 

8 

6.61 

96.95 

2 25 

3410 



5.30 

99.90 

2.50 


3547 

6 

4.23 

102.25 

2.75 

3686 



3.45 

104.13 

3.00 


3822 

4 

2.65 

105.70 

3.25 

3959 



2.00 

106.78 

3.50 


4096 

2 

1.64 

107.70 

3.75 

4233 



1.28 

108.42 

4.00 


4370 

1 

1.01 

108.98 

4.25 

4507 



0.79 

109.43 

4.50 


4644 

0 

0.61 

109.77 

4.76 

4781 



0.48 | 

' 110.04 

5,00 


4918 


0.38 

110.25 

5.26 

5055 



0.30 

110.42 

5 50 


5192 


0.23 

110.55 

5.75 

5329 



0.18 

110 65 

6.00 


5466 


0.27 

110.73 




111 

111.00 



182 


FLOOD FLOWS 


183 


and finally from (37) and (38) the constant 1/a and the most probable annual 
flood u From the numerical values in Table IV the. linear transformation (49) 
for the Rhone is 

x = 2177.03 + 548 19«/ 



This leads to the determination of the theoretical flood discharges. The theo¬ 
retical return periods log T(x) are given in Table II, col. 3 as a function of the 
reduced variable y and of x (col. 4). The discharges x obtained by letting 
y take on the values —2.75 to 6.00 in the linear transformation, are given in 









184 


E. J. GEMBEL 


Table V, cols. 2 and 3 and plotted in Fig. 1. The distances Ax used in the 
calculations of the theoretical discharges are 1/4 a = 137.05. 

Along the abscissa are plotted the logarithm of the return periods and the 
return periods in years; along the ordinate are plotted the corresponding flood 
discharges and the modal annual flood it. The straight line from the point (u, 0) 
to the asymptote gives the most probable flood as a function of time. The 
theoretical curve corresponds quite closely with the general course of the ob¬ 
servations. For small floods the theoretical return periods are practically iden- 



Fig. 1. Rh6ne at Lyon (Francs:) 1826-1936 
Observations Table III: Recurrence intervals, -f — — Exceedance intervals, 
•-•; Return periods, -; Theory Table II, cols. 3 and 4: Extrapolation,-. 

tical with the observed values. But for "the very largo floods the theoretical 
curve surpassed both the exceedance and recurrence intervals. 

The observed cumulative histogram is shown in Fig. 2. We calculate from 
Table II, col. 2, the frequencies 1112B(x) (Table V, col. 6). These theoretical 
values (x, 11122(2)) are also plotted in Fig. 2. The agreement between theory 
and observations is very good. 

For the comparison of the observed and theoretical distributions of the flood 
discharges we use what might be called the natural classification. For the 







FLOOD FLOWS 


185 


observations, the length of the class intervals and the beginning of the first class 
interval are arbitrary. In order to obtain the observed distribution of the flood 
discharges, it is natural to use the theoretical class intervals set forth in Table V, 
col. 2. The data of the third column can be interpreted as the midpoints of the 
class intervals given in col. 2. The frequencies for these class intervals are ob- 



Fiq 2 Cumulative Frequence or the Flood Discharges. Rh6ne, Leon (Franc®) 

1826-1036 

Observations Table III cols 1 and 2, •—•; Theory Table V cols. 2, 3 and 6, / 

tained from Table III, and are given in Table Y, col, 4. The observed distribu¬ 
tion is shown in Fig. 3, To obtain the corresponding theoretical distribution we 
calculate from Table V, col. 6, the difference between two cumulative frequencies 
disjoined by one, i.e., we pair consecutively the first and third, the second and 
fourth items and so on This theoretical distribution given in col. 5 and the 
observed distribution are based on class intervals of the same length. Fig. 3 






186 


E. J, GTJMBEL 


shows that the theoretical distribution A2B(x) of the largest values agrees in a 
satisfactory way with the observed distribution A'SB (a;) of the flood discharges. 
Table "VI, col. 1, gives the corrected 3 flood discharges x m , measured in units of 
1000 cubic feet per second, for the Mississippi River at Vicksburg (1890-1939), 
(71 ~ 50), arranged according to increasing magnitude; col. 2 gives the serial 
number m. We calculate the logarithm of the observed leturn periods log 
n/(n — m), (col. 3). The observations (x m , log 'T(x^)) and 0r m+1 , log n T(x m )) 
are plotted in Fig. 4. The constants obtained by formulas (34)- (38) are shown 



Fig. 3. Distribution of the Flood Discharges RiiCne, Lyon (France) 1826-1936 
Observations Table V cols. 2, 3 and 4, (H ; Theory Table V cols 2, 3 and 6, C 

in Table IV. By (49) the theoretical floods x corresponding to the return 
periods T{x) presented in Table II, col. 3, are 

x = 1201.98 + 266.14 y. 

These floods are given in Table II, col. 5. The class interval used is 

l/4a = 66.5, 

! These data have been put at my disposal through the courtesy of Mr. A E. Brandt of 
the U, S. Department of Agriculture 




FLOOD FLOWS 


187 


The theoretical curve ( x, log T(x)), plotted m Fig. 4, agrees in a very satisfactory 
way with the observations. For the large floods the theoretical return periods 
are between the exceedance and recurrence intervals. 

The calculations of the theoretical return periods for other streams, e.g. the 
Columbia, Connecticut, Cumberland, Rhine, and Tennessee Rivers, for which 
reliable observations exist for more than GO years, also show a good agreement 
with the observations The goodness of lit diminishes for streams for which 
the number of observations is smaller and for which the data are not very 
reliable. 



Fig. 4. Mississippi Riveh at Vick&bubg, (Miss ) 1800-1939 

Observations Table VI Recurrence intervals, + — — +, Exceedance intervals, 
*-•, Return periods,-, Theory Table II, cols. 3 and 5, Extrapolation,-. 

6. Summary and conclusions. In order to apply any theory we have to sup¬ 
pose that the data are homogeneous, i.e. that no systematical change of climate 
and no important change in the basin have occurred within the observation 
peiiod and that no such changes will take place in the period for which extra¬ 
polations are made. It is only under these obvious conditions that forecasts 
can be made. 

The theoretical return period T{: r), the mean number of years between two 
annual flood discharges greater than or equal to a:, is a statistical function such 
as the distribution ra(a:) or the probabilities TT(x) and P(x). There are two 




188 


E. J. GUMBEL 


sets of observed values corresponding to the theoretical set. The exceedance 
interval r T(x m ) formula (9), and the recurrence interval "T(x m ) formula (O'); 
x m being the mth flood discharge, where m is counted from below. As any 
theory must include both notions, no separate theory for exceedance or recur¬ 
rence intervals is possible. 

The return period T(x) of a flood discharge % is found by formula (39). For 
large values of x the flood discharge converges toward a linear function (42) of 
the logarithm of the return period. This is the scientific basis of Fuller’s em¬ 
pirical formula. The two constants of our formula u and 1/a, are, respectively, 
the most probable annual flood discharge and a multiple of the standard devia¬ 
tion (28). Their values depend upon the drainage basin and known geological 
and meteorological factors. It is beyond our present task to consider the influ¬ 
ence of these factors. Our method can be summarized by the following rules: 

1) For each year find the maximum daily discharge x m (do not use momentary 
peaks) and arrange these n data in increasing magnitudes. 

2) Calculate for each discharge x m (m = 1, 2, ■ •. , n — 1), the values log 
'T(x m ) = log ft — log (ft — m) and plot the curves x m , log n/(n — m), and 
Xm+i , log n/(n — 7ft), These are the observed exceedance and recurrence 
intervals. 

3) Calculate the annual mean flood & and the annual mean squared flood u; 
determine according to (36)-(38) the standard deviation 


and the two constants 


1/a = 0.77970s, 

. 0.57722 

u = u — -. 

a 

4) The theoretical flood discharges x corresponding to the logarithm of the 
return period T(x ) given in Table II, col. 3, are obtained by the linear trans¬ 
formation 


x — u + y/a 

where y is taken from Table II, col. 1. Plot a: as a function of log T(x). For 
large values of x and for extrapolation it is sufficient to use the linear asymptote 
obtained graphically, 

The linear part of the theoretical curve (x, log T) permits of two interpreta¬ 
tions: First, T is the theoretical return period of a flood greater than or equal 
to x; second, x is the most probable flood to be reached within T years. The 
second interpretation holds for the straight line through the point (u, 0). 

The figures show a close agreement between observed and theoretical values. 



FLOOD FLOWS 


189 


The observed curvature of the return periods is brought out by the theoretical 
graph. 

The agreement between theory and observation is excellent for floods which 
correspond to reduced values of y 5 3. For the two or three extreme floods, 
the return periods are based on a few observations and, consequently, the agree¬ 
ment is not very good. No theory can be verified by two or three observations. 
Generally speaking, the theory fits the observations as closely as could be ex¬ 
pected for such a complicated phenomenon 

In order to make a further test of our results, we need a numerical measure 
for the weights to be given to the theoretical points Therefore, for a given 
probability we must find the corresponding theoretical limits for the observed 
return periods. The theory of positional values will give these control curves. 
Since it was the purpose of this article to develop and make clear the basic 
method, we have refrained from introducing this subject. 

It is our claim that the calculus of probabilities and especially the theory of 
largest values, is an efficient tool for the solution of certain hydrological problems. 

REFERENCES 

[1] G. E. Becker and C. E. van Ohstrand, Hyperbolic Functions, SmithBonian Mathe¬ 

matical Tables, Washington, 1931 

[2] A. Coutagnb, “Etude statistiquc des debits de crue," Revue Gtnbrale de l’Hydraulu/ue 

Paria (1937). 

[3] A Coutagne, “Etude statistique et analytique dea cruea du Rh6ne & Lyon," Comptes 

Rendu s du Congrk a pour I’Uhhsahon des Eaux, Lyon, (1938). 

[4] R, A Fisher and L. H. C. Tippett, "Limiting forms of the frequency distribution 

of the smallest and the largest member of a sample," Proc Camb, Phil. Soc., 
Vol. 24 (1928) 

[5] M. FiuSchet, “Sur la loi de probability de l’6cart maximum," Annales Soc Polon. 

Math., Vol. 0, (1927) 

[6] Weston E Fuller, “Flood flows,” Trans. Am Soc. Civil Eng., Vol. 77 (1914). 

[7] Weston E Fuller, E. Lane and others, “Discussion on flood flow characteristics," 

Trans. Am. Soc Civil Eng., Vol. 89 (1926). 

[8] John C. Geter, “New curve fitting method for analysis of flood-records," Trans, 

Am Geophy. Union, Part II (1940), pp. 660-608 

[9] Robert Gibrat, “Am&nagement hydro-tlectnque des cours d'eau,” “StatiBtique 

math6matique et calcul des probability, "Revue Gbnirale de V Electncitk, Vol. 
32, No. 16, 16, Paris (1932). 

[10] Eugene L. Grant, “The probability-viewpoint in hydrology," Trans. Am. Geophy. 

Union, Part 1 (1940), pp. 7-12. 

[11] E. J. Gumbel, “Les valeurs extremes des distributions statistiques," Annales de 

1‘Instxtul Henri Poincare, Vol 4 (1935), p 116. 

[12] E. J. Gumbel, "La plus grande valeur,” Aktuarske Vedy, Vol. 6, No. 2, p. 83, No. 3, 

p. 133, No. 4, p. 140, Prague (1935-36). 

[13] E. J. Gumbel, La Dur&e Extreme de la Vie Humaine, ActualiUs Scientifiques ct In- 

dustnelles, Hermann et Cie, Pans, 1937 

[14] E. J. Gumbel, "Les intervalles extremes entre les femissions radioaotivea," Jour, de 

Phya., Serie 7, Vol. 8, No. 8, No 11 (1937). 

[16] Allen Hazen, Flood Flows, A Study of Frequencies and Magnitudes, John Wiley and 
Sons, Inc., New York, 1930. 



190 


E. J. GUMUEL 


[16] Robert E Horton, “Ilydiologic conditions ua affecting this results of (he applica¬ 

tion of methods of frequency analysis to flood records," Geological Survey Water- 
Supply Paper 771, Washington (1936) 

[17] Clare nob S. Jarvis, "Floods in the United States, Magnitude and Frequency,’’ 

Geological Survey Wafer Supply Paper 771, Washington (1936), 

[18] R, von Mises, "La disti ibution de la plus grande do n vale urn," Revue Math , dc l' Union 

Inierbulkanique , Vol. I, Athens (1936). 

[19] T, Saville, "A study of methods of estimating flood flows applied to the Tennessee 

River,” Publications from College of Engineering, Nr 0, New York (1935 3(5). 

[20] J. J Slade, "The reliability of statistical methods m the determination of flood 

frequencies," Geological Survey Water-Supply Paper 771, Washington (1930) 



ON THE FOUNDATIONS OF PROBABILITY AND STATISTICS 1 

By R. von Mibes 
Harvard University 

1. Introduction. The theory of probability and statistics which I have been 
upholding for more than twenty years originates in the conception that the only 
aim of such a theory is to give a description of certain observable phenomena, 
the so called mass phenomena and repetitive events, like games of chance or 
some' specified attributes occurring in a large population. Describing means 
here, in the first place, to find out the relations which exist between sequences 
of events connected in some way, c.g. a sequence of single games and the sequence 
composed of sets of those games or between a sequence of direct observations 
and the so called inverse probability within the same field of observations. The 
theory is a mathematical one, like the mathematical theory of electricity, based 
on experience, but operating by means of mathematical processes, particularly 
the methods of analysis of real variables and theory of sets. 

We all know very well that in colloquial language the term probability or 
probable is very often used in cases which have nothing to do with mass phe¬ 
nomena or repetitive events, But I decline positively to apply the mathemati¬ 
cal theory to questions like this. What is the probability that Napoleon was a 
historical person rather than a solar myth? Tins question deals with an iso¬ 
lated fact which in no way can bq considered as an element in a sequence of 
uniform repeated observations. Wc are all familiar with the fact that, c.g, the 
word energy is often used in every day language in a sense which does not 
conform to the notion of energy as adopted in mathematical physics. This 
does not impair the value of the precise definition of energy used in physics and 
on the other hand this definition is not intended to cover the entire field of daily 
application of the term energy, 

We discard likewise the scholastic point of view displayed in a sentence of this 
kind: “. . . that both in its meaning and in the laws which it obeys, probability 
derives directly from intuition and is prior to objective experience.'’ This 
sentence is quoted from a mathematical paper printed in a mathematical journal 
of 1940. The same author continues calling probability a metaphysical problem 
and speaking of the difficulties "which must m the nature of things always be 
encountered when an attempt is made to give a mathematical or physical solu¬ 
tion to a metaphysical problem.” In my opinion the calculus of probability 
has nothing to do with metaphysics, at any rate not more than geometry or 
mechanics has. 


1 Address delivered on September 11, 1940 at a meeting of the Institute of Mathematical 
Statistics in Hanover, N. H 


191 



192 


H. VON MIBES 


On the other hand we claim that our theory, which serves to describe ob¬ 
servable facts, satisfies all reasonable requirements of logical consistency and is 
free from contradictions and obscurities of any kind. I am now going to outline 
the essential ideas of the theory as developed by me since 1919 and I shall have 
to refer as to the proof of its consistency to the recent work of A. H. Copeland, 
of J. Herzberg and of A Wald. Then I will give some examples of application 
in order to show how the theory works and how it applies to actual problems in 
statistics. 

2. The notion of kollektiv. The basic notion upon which the theory is estab¬ 
lished is the concept of kollektiv. We consider an infinite sequence of experi¬ 
ments or observations every one of which supplies a definite result in the form 
of a number (or a group of numbers in the case of a kollektiv of more than one 
dimension). We shall designate briefly by X the sequence of results Xi , x s , 
x 3 , ■ ■ ■ . In tossing a die we get for X an endless repetition of the integers one 
to six, x — 1, 2, • • ■ 6 If we are interested in death probability, we observe a 
large group of healthy 40 year old men and mark a one for each individual sur¬ 
viving his 41st aniversary and a zero for each man who dies before, so that' the 
sequence Xi, x 2 , x 3 , ■ ■ ■ consists of zeros and ones. In a certain sense the 
kollektiv corresponds to what is called a population in practical statistics. Ex¬ 
perience shows that in such sequences the relative frequency of the different 
results (one to six in the first of our examples, one and zero in the second) varies 
only slightly, if the number of experiments is large enough. We are therefore 
prompted to assume that in the kollektiv, i.e. in the theoretical model of the 
empirical sequences or populations, each frequency has a limiting value, if the 
number of elements increases endlessly. This limiting value of frequency is 
called, under certain conditions which I shall explain later, the “probability of 
the attribute in question within the kollektiv involved.” The set of all limiting 
frequencies within one kollektiv is called its distribution. 

Let me insist on the fact that in no case is a probability value attached to a 
single event by itself, but only to an event as much as it is the element of a well 
defined sequence. It happens often that one and the same fact can be considered 
as an element of different kollektivs. It may then be that different probability 
values can be ascribed to the same event. I shall give a striking example of this, 
which we encounter in the field of actual statistical problems, at the end of this 
lecture. 

The objection has been made: Since all empirical sequences are obviously 
finite sequences, why then assume infinite kollektivs? Our answer is that any 
straight line we encounter in reality has finite length, but geometry is based on 
the notion of infinite straight lines and uses e.g. the notion of parallels which 
has no sense, if we restrict ourselves to segments of finite lengths. Another 
objection, often repeated, reads that there is a contradiction between the exist¬ 
ence of a frequency limit and the so called Bernoulli theorem which states that 
sequences of any length showing a frequency say ^ can also occur in cases for 



PROBABILITY AND STATISTICS 


193 


which the probability equals But it has been proved, in a rigorous way ex¬ 
cluding any doubt, that the two statements are compatible, even by explicit 
construction of infinite sequences fulfilling both conditions. I would even claim 
that the real meaning of the Bernoulli theorem is inaccessible to any probability 
theory that does not start with the frequency definition of probability. 

Now we are in the position to explain how our probability theory works. 
This sequence of zeros and ones 

(X) 101|001|100|011|11Q|011|010|111 ... 

may represent the outcomes of a game of chance The ones show gains, the 
zeros losses for one of the two players If we separate the terms of X into groups 
of three digits and replace each group by a single one or zero according to the 
majority of terms within the group, we get a new sequence 

(X‘) 10011101 

which represents the gains and losses in sets of three games Our task is now 
to compute the distribution, i.e. the limiting frequencies of zeros and ones in 
this new sequence X assuming the two frequencies m X are known A sequence 
can formally be considered as a unique number like a decimal fraction with an 
infinite number of digits. Then the transition from X to X' can be called a 
transformation of a number X' — T(X). As our sequences have to fulfill certain 
conditions Copeland calls the sequences X, X' admissible numbers. What I 
just quoted was of course a very special example of a transformation of a number. 
But we have to emphasize that all problems dealt with in probability theory, 
without any exception, have this unique form: The distribution or the limiting 
frequencies in certain sequences are given, other sequences are derived from the 
given ones by certain operations, and the distributions m these derived sequences 
have to be computed. In other words: Probability theory is the study of trans¬ 
formations of admissible numbers, particularly the study of the change of distribu¬ 
tions implied by such transformations 

We know four and only four simple, i.e. irreducible transformations or four 
fundamental operations. They are called selection, mixing, partitioning and 
combination. By combining these basic processes we can settle all problems 
in probability theory. The formal, mathematical difficulties in carrying out the 
computation of the new distributions may become very serious in certain cases, 
particularly if we have to apply an infinite number of transformations (asymp¬ 
totic problems). But, in the clearly defined framework of this theory no space 
is left for any metaphysical speculations, for ideas about sufficient reason or in¬ 
sufficient reason, for notions like degree of evidence or for a special kind of prob¬ 
ability logic and so on. And further no modification is needed for handling usual 
statistical problems: Terms like inverse probability, likelihood, confidence 
degrees, etc. are justified and admitted only as far as they are capable of being 
reduced to the basic notion of kollektiv and distribution within a kollektiv. I 
will give some more details to this point later. Meanwhile let me turn to a 



194 


H. VON MISES 


general question which, m a certain way, is the crucial point in establishing the 
new probability theory 

3. Place selections and randomness. It is obvious that we have to restrict 
still further the notion of kolloktiv or the field of sequences which can lie con¬ 
sidered as the objects of a probability investigation. The successive outcomes 
of a game of chance differ very clearly from any regular sequence us defined by a 
simple arithmetical law, e.g. the legularly alternating sequence 0 10 1 
0 1 0 1 ■ * . A typical property which singles out the irregular or random 
sequences and which has to be reproduced in every probability theory is that, if 
p is the probability of encountering a one in the sequence, then p 2 is the prob¬ 
ability of two ones following each other immediately. Any probability theory has 
to introduce an axiom which enables us to deduce this theorem and others of a 
similar type The question is only how to find a sufficiently general and con¬ 
sistent form for it, The procedure I have chosen consists in using a special kind 
of transformation of a sequence, which I call a place selection. 

A place selection is defined by an infinite set of functions s„(ati , cc 2 , ■ ■ • .c„_i) 
where x x , x 2 , , . • • are the digits of an admissible number or a kolloktiv and 

s„ has one of the two values zero or one. Here s„ = 1 means that the nth digit 
of the sequence is retained, s n = 0 means that it is discarded. The decision 
about retaining or discarding the nth elements depends as you see, only on the 
preceding values x x , x 2 , • • • 2 „_i, but not on r n or the following digits. Example 
of a place selection: 

s n = 1, if x n -i = 0 for prime numbers n, 
if *„_i = 1 for n not prime, 

= 1, and s„ = 0 in all other cases. 

Experience shows that, if we apply such a place selection to the sequence X 
of outcomes of a game of chance, we get a new, selected sequence S(X) in which 
the frequencies of gains and losses are about the same as in X. This fact or 
the practical impossibility of a gambling system suggests the adoption of the 
following procedure in handling transformations of admissible numbers. 

First, if within a certain investigation the transformation applied to X is a 
place selection, we assume that the distribution in X' — S(X) is the same as 
inX: distr S(X) = distr X. Second, if a general transformation T is applied 
to X, say X' = T(X), then we examine whether the existence of a place selection 
S that changes the distribution in X' (so as to have distr S(X') yi distr X') 
implies the existence of a place selection Si that would affect the distribution in 
X (so as to give distr iSi(X) = distr X). If this is the case, we say that X' is 
a kollektiv, provided that the original sequence X was considered to be a kollek- 
tiv. Take e.g. for X the sequence resulting from tossing a die endlessly, and 
call Pi, pa, ■ • p B the limiting frequencies of the six possible outcomes 1,2, • ■ • 6 
The transformation T may consist in replacing every 1 in the sequence X by a 



PROBABILITY AND STATISTICS 


195 


2, every 3 by a 4, and every 5 by a 6. The new sequence consists of only three 
different kinds of elements 2, 4, 6 and therefore its distribution includes only 
thice values p 2 , p[ , pi where evidently j>' 2 = pi + p 2 etc. Here it is almost 
obvious that if a place selection applied to X' changes the value of pi, the same 
selection if applied to X must change cither pi or p 2 Ro, if the original sequence 
X was considered as a lcollektiv, X' has to be admitted too. 

Now the question arises whether this procedure is in itself consistent or 
whether it can lead to contradictions. We were concerned up to now with 
kollektivs the elements of which belong to a finite set of distinct numbers 
ci , e 2 , • ■ eh and the distributions of which are therefore defined by k non¬ 
negative values pi , p 2 , ■ ■ ■ Pk with the sum 1. In this case it was pointed out 
by Wald and by Copeland that, if an arbitrary distribution and an arbitrary 
countable set 2 of place selections are given, there exists a continuum of se¬ 
quences every one of which has the given distribution, which is not affected by 
any place selection belonging to 2. Now it may be supposed that in a concrete 
problem a sequence X' is denved from a sequence X by a finite number of 
fundamental operations involving a finite set 2' of place selections Another 
finite set 2" ma 3 r consist of selections employed in establishing that certain 
sequences used in the derivation of X' are “combinable” ones Finally an 
arbitrary countable set 2 of selections S may be assumed According to our 
procedure we have shown that to any place selection S which affects the distribu¬ 
tion in X' corresponds a certain Si which, when applied to X, changes the dis¬ 
tribution of X. All these Si corresponding to the elements jS of 2 form a 
countable set 2i. Now the set 2 2 including 2', 2", 2i and also including all 
products of two of its own elements is a countable set too What we use in 
computing the distribution of X' is only the fact that the given sequence X is 
unaffected by the selections that are elements of 2 2 . It follows from the above 
quoted results that we can substitute for X a numerically specified sequence 
and carry out all operations upon tins specified sequence. So it is proved that 
no contradiction can arise in computing the final probability according to our 
conception. 

I cannot enter here into a discussion of the more complicated case where the 
range within which the elements of a kollektiv vary, is an infinite one, either a 
countable set or a continuum All principal problems connected with estab¬ 
lishing the notion of kollektiv can be settled satisfactorily, at any rate, by con¬ 
sidering those general forms of sequences as limiting cases of kollektivs with a 
finite set of attributes. 

4. Example; Set-of-games problem. I want to present now a simple, but 
instructive example to show how the theory works and what task a mathematical 
foundation of the calculus of probability has to achieve. Let us recall the two 
sequences X and X' composed of zeros and ones of which we spoke above. The 
first represented the outcomes of a sequence of single games, the second the 
outcomes of triple sets of those games. If X is considered as a kollektiv with 



196 


B. VON MISES 


given probabilities p and q for one and zero, it is easy to deduce the correspond¬ 
ing values p 1 and q' for X' and to show that X' is a kollektiv too We begin by 
carrying out three selections which single out from the original sequence Xi, 
x 2 , X 3 . • first, the elements xi, , x?, ■ ■ • second, the elements x 2 , x t , xa, • • • 

and third, the elements x s , x a , x t , • ■ • , It can be shown by means of certain 
further place selections that these three kollektivs which we call X x , X 2 , X a 
are combinablc. That means that combining the corresponding elements of 
the three sequences like x a x 2 x a , x t x a x a , x 1 x i x a , • • ■ leads to a new three dimen¬ 
sional kollektiv X a in which each permutation of three digits 0 and 1, has a 
probability equal to the corresponding product of p- and g-factors. For in¬ 
stance the probability of encountering the group 111 is p 3 and for the group 110 
it is p 2 q. Now wc operate a mixing upon Xa by collecting all permutations 
with two or three ones. We find m a well known way the sum 7 / + 3 p 2 q for 
the probability 7 / of ones in the sequence X'. So far the result is very well 
known and can be reached—in my opinion, in a very incomplete and unsatis¬ 
factory way—also by the classical methods. 

But what I want to discuss here is a slightly modified question. If the 
sequence X means gains and losses for single games and if the arrangement for 
sets of three games is made as indicated before, then in a real play the gains 
and losses of sets are counted in a different way. For, if the first two games of 
a set are both won or lost by the same player, the fate of the set is decided and 
there is no sense to play the third game. So the loss of the second sot in our 
example will already be recognized after the fifth game and the actual sixth 
game will be considered as the first game of the third set. In this way the 
original sequence X decomposed into groups of two or three games 

(X) 101|00|11|00|011|11|00|11|010|11|... 

leads to a new sequence X" 

( x ") 1010110101 ... 

which is obviously different from X'. Everyone familiar with the usual han¬ 
dling of the probability concept will say that in X" the probabilities of zeros and 
ones must be the same as in X 1 But a mathematical foundation of theory of 
probability, if it deserves this name, has to clear up the question: From what 
principles or particular assumptions and by what inferences may we deduce the 
equality of the limiting frequencies in X' and X "? 

There is no difficulty in solving this problem from the point of view of the 
frequency theory. We have only to apply somewhat different place selections 
instead of the above used which lead to the kollektivs X 2 , X 2 , X 3 . I showed 
elsewhere how the general set-of-games problem can be satisfactorily treated in 
this way Here I want to stress only that the problem as a whole is completely 
inaccessible by any of the other known approaches to probability theory. The 
classical point of view which starts with the notion of equally likely cases and 
rests upon a rather vague idea of the relationship between probability and 



PROBABILITY AND STATISTICS 


197 


sequences of events does not even allow the formulation of the problem. In 
the so called modernized classical theory, as proposed by Fr6chet, probabilities 
are defined as "physical magnitudes of which frequencies are measures.” 
Fr^chet would say that the frequencies both in X' and in X" are measures of 
the same quantity. But why? We face here obviously a mathematical ques¬ 
tion which cannot be settled by referring to physical facts. It is clear that the 
equality of the distributions in the two sequences X' and X" is due to the 
randomness or irregularity of the original sequence X. No theory which does 
not take in account the randomness, which avoids referring to this essential 
property of the sequences dealt with in probability problems, can contribute 
anything toward the solution of our question. 

I have to make some special remarks about the so-called measure theory of 
probability. 2 

6. Probability as measure. Up to now we have been concerned only with 
the simplest type of kollektivs, namely, with those sequences the elements of 
which belong to a finite set of numbers so as to have a distribution consisting 
of a finite number of finite probabilities with the sum 1 It may be true that 
all practical problems, in a certain sense, fall into this range. For, the single 
result of an observation is always an integer, the number of smallest units 
accessible to the actual method of measuring. Nevertheless in many cases it 
is much more useful to adopt the point of view that the possible outcomes of an 
experiment belong to a more general set of numbers, e.g. to a continuous segment 
or any infinite variety. If we include the case of kollektivs of more than one 
dimension, we have to consider a point set in a A>dimensional space (where 
even k may be infinite) as the label set or attribute set of the kollektiv. In 
order to define the probability in this case we have to choose a subset A of the 
label set and to count among the first n elements the number n A of those elements 
the attributes of which fall into A. Then the quotient n A ' n is the frequency, 
and its limiting value for n infinite will be called the probability of the attribute 
falling into A within the given kollektiv 

It was rightly stressed by many authors that in the case of an infinite label set 
some additional restrictions must be introduced In particular A. Kolmbgoroff 
set up a complete system of such restrictions. We cannot ask for the exist¬ 
ence of the limiting frequency in any arbitrary subset A. It will be sufficient 
to assume that the limit exists for a certain Korper or a certain additive family 
of subsets. If it exists for two mutually exclusive subsets A and B, the limit 
corresponding to A + B will be, by virtue of the original definition, the sum of 
the limits connected with A and B. We can now insert a further axiom involving 
the complete additivity of the limiting values. So we arrive at the statement 

J What I call measure theory here is essentially that proposed by Kolmogoroff in his 
pamphlet of 1933. As to the new theory developed by Doob in his following paper (where 
instead of the label space the space of all logically possible sequences is used in establishing 
the measures) see my comment on page 216. 



198 


E. VON MISES 


that probability is the measure of a set. All axioms of Kolmogoroff can be 
accepted within, the fiamework of our theory as a part of it, but in no way as a 
substitute for the foregoing definition of probability. 

Occasionally the expression probability as measure theory is used in a dif¬ 
ferent sense. One tries to base tin* whole theory on the special notion of a set 
of measure zero. One of the basic assumptions in my theory is that in the 
sequence of results wc* obtain in tossing a so called correct die the frequency, 
say of the point 6, has a certain limiting value which equals 1/0 A different 
conception consists in stating that anything can happen in tlu> long run with a 
correct die, even that an uninterrupted sequence of six’s or an alternating se¬ 
quence of two’s and four's or so on may appear. Only all these events which 
do not lead to the limiting frequency 1/6 form, together as a whole, a set of 
events of measure zero. Instead of my assumption, the limiting value is 1/0 
we should have to state: It is almost certain that a limit exists and equals 1/0 
Nothing can be said against such an alluring assumption from an empirical 
standpoint, since actual experience extends in no case to an infinite range of 
observations. The only question is whether the asumption is compatible with 
a complete and consistent theory I cannot see how this may be achieved 
Before saying that a set has measure zero we have to introduce a measure system 
which can be done in innumerable ways. If e.g, wc denote the outcome six by a 
one and all other outcomes 1 to 5 by zero, we get as the result of the game with 
a die an infinite sequence of zeros and ones. It has been shown by Boiel that 
according to a common measure system the set of all 0, 1 sequences which do not 
have the limiting frequency £ has the measure zero. In Lin's way it turns out 
to be almost certain that the limiting frequency of the outcome six in the case 
of a correct die is Other values for the limit can lie obtained by a similar 
inference. It is a coirect but misleading idea that the measure zero is unaffected 
by a regular (continuous) transformation of the assumed measure system, since 
in our field of problems different measures which aie not obtained from one 
another by a regular transformation have equal rights So, saying that a certain 
set has the measure zero makes in our case no moic sense than to state that an 
unknown length equals 3 without indicating the employed unit. 

In recapitulating this paragraph I may say First, the axioms of Kolmogoroff 
are concerned with the distribution function within one kollektiv and an* 
supplementary to my theory , not a substitute for it. Second, using the notion of 
measure zero in an absolute way without reference to the arbitrarily assumed 
measure system, leads to essential inconsistencies. 

6 . Statistical estimation. Let me‘now turn to the last point, the application 
of probability theory to one of the most widely discussed questions in today’s 
statistical research: the so-called estimation problem. Many strongly divergent 
opinions are facing each other here. I think that the probability theory based 
on the notion of kollektiv is best able to settle the dispute and to clear up the 
difficulties which arose in the controversies of different writers. 



PROBABILITY AND STATISTICS 


199 


We may, without loss of generality, restrict ourselves to the simplest case 
of a single statistical variable x and a single parameter d, where x of course may 
be the arithmetical mean of n observed values Here (and likewise in the case 
of more variables and moie parameters) we have to distinguish carefully among 
four different kollektivs which arc simultaneously involved in the problem 
The range within which both x and d vary will be assumed to be a continuous 
interval so that all distributions will be given by probability densities. 

The first kollektiv we deal with is a one-dimensional one where the probability 
of x falling into the interval x, x + dx depends on x and on a parameter d If 

( 1 ) p{x 1 d) 

denotes the corresponding density and the limits A, B within which x possibly 
falls depend on A too, we have 

( 1 ') / p{x | A) dx = 1 for each d. 

Jaw 


In order to fix the ideas wc may imagine that the first kollektiv consists in 
drawing a number x out of an urn and that d characterizes the contents of the 
urn. Asking for an estimate of d implies the assumption that different possible 
urns arc at our reach every one of which can be used for drawing the x The d 
values for the different urns fall into a certain interval C, D. It is usual to sup¬ 
pose that the urns are picked out at random so as to give anothci one-dimensional 
kollektiv with the independent variable d Let p 0 (d) rid be the probability of 
picking an urn with the characteristic value falling into the interval d, d -)- rid. 
This density 


(2) Po(d) 

is often called the prior or a priori probability of d As the range within which 
d vanes is confined by the constants C and D, wc have obviously 


(20 



1 . 


Now from these two one-dimensional kollektivs with the variables x in the 
first, d in the second, we deduce by combination (multiplication) a two-dimen¬ 
sional kollektiv with the density function 

(3) P(d, x) = p 0 (d) • p{x | d). 

The individual experiment which forms the element of this third kollektiv con¬ 
sists of picking at random an urn and drawing afterwards from this urn. Both 
x and d are now independent, variables (attributes of the kollektiv) and it is easy 
to see that it follows from ( 1 ) and ( 2 ) 



200 


R. VON MISES 


We will return later to this two-dimensional kollektiv. Let us, first, derive 
from it, by applying the operation of partitioning (Teilung), our fourth and last 
kollektiv which is one-dimensional again, Partitioning means that we drop 
from the sequence of experiments which form the third kollektiv all those for 
which the a>value falls outside a certain interval x, x + dx) and that in this 
way we consider a partial sequence of experiments with only the one variable <5. 
The distribution of lvalues within this sequence with quasi-constant x is given, 
according to the well known rule of division or rule of Bayes (a rule which can 
be proved mathematically) by 3 


( 4 ) 


pM I «) = - . -p -ty— - ~ c(x) p 0 (o) p(x I d). 


/ P(M) 

J c 


dd 


It follows immediately that 

(40 


[ p L (# 1 x) dd = 1. 

J C 


This function pi of d depending on the parameter x is generally called the 
posterior or a posteriori probability of d. 

If pi(# | x) can be computed according to the formula (4), every question con¬ 
cerning the ‘‘presumable" value of t? as drawn from the outcome x of an ex¬ 
periment is completely answered. We can find indeed, by integration the 
probability which corresponds to any part of the interval C, D of d and so the 
estimation problem is definitely solved. But the trouble is that in most cases of 
practical application nothing or almost nothing is known about the prior prob¬ 
ability po(tf) which appears as a factor in the expression of pi . Hence arises 
the new question: What can we say about the devalues without having any informa¬ 
tion about its prior probability? This is the estimation problem as it is generally 
conceived today. 

The first successful approach to the answering of this question was made by 
Gauss. If we do not know pi , we know however, except for a constant factor, 
the quotient P\/p B , posterior probability to prior probability which equals 
cp(x | &). The maximum of this quotient must be greater than one, since the 
average values of both p B and pi are the same. So the maximum means the 
point of the greatest increase produced by the observed experimental value of x 
upon the probability of d. It seems reasonable to assume the d-valuc for which 
the ratio pi/p 0 reaches its maximum as an estimate for d: It is the value upon 
which the greatest emphasis is conferred by the observation. This idea, orig¬ 
inally proposed by Gauss in his theory of errors, has been later developed chiefly 
by R. A. Fisher, and is known today as the maximum likelihood method. Calling 
the ratio pi/p 0 likelihood seems indeed an adequate nomenclature. 

! For brevity Bayes' rule is employed ia the text as in the case of a discontinuous dis¬ 
tribution. The correct procedure in the ease of a continuous x would require that we first 
use finite intervals and then pass to the limit. 



PROBABILITY AND STATISTICS 


201 


The method of estimation used most frequently today is not the maximum 
likelihood method, but the so called confidence interval method, inaugurated 
by R A. Fisher and now successfully extended and applied by J. Neyman. This 
method uses the third of the above mentioned kollektivs instead of the fourth, 
i.e. the two-dimensional probability P(i>, x). At first sight it seems hopeless 
to use this function which includes the unknown prior probability po($) as a 
factor. But it turns out as Neyman has shown 4 (and this is the decisive idea 
of the confidence interval method) that we can indicate m the x, d-plane special 
regions for which the probability // P($, x ) dx dd- is independent of po(d). In 
fact, if we point out for every such an interval x x , Xi as to have 

(5) / p(x 1 1 ?) dx = a, 0 < a < 1, 



it follows immediately from (2) and (5) for the region covered by these intervals 
(6) / / P(t9, x) dx dd = / po(£) dt? / p(x | &) dx = a. 

Jc JxiW) J C 

For given a the intervals can be chosen in different ways. If we choose X\ = A 
for t? = C and = B for = D, we get a strip or belt, as shown in Fig. 1 
which supplies for every given x a smallest value and a greatest value tb . 
The definition of our third kollektiv leads to the conclusion: If we predict each 
time a certain x is observed that t? lies between the corresponding #i and , then 
the probability is a that we are right, whatever the prior probability may he . 5 It is 

* J. Neyman, Roy. Slat. Soc. Jour., Vol. 97 (1934), pp. 690-92 

* After my lecture Dr A. Wald called my attention to Neyman’s suggestion, namely 

that this statement can be generalized by admitting that the infinite sequence of iS-values 
which results from picking out successively the urns for drawing a number x, does not 
fulfill the conditions of a kollektiv. So, instead of tho terms "whatever the prior prob¬ 
ability may bo” we can Bay “whatever the method of picking out the urns may be.” In 
fact, let us consider the case where 0 can assume only a finite number of values i?i, 0 5 , • • • 
Ok . Among the n first trials let n, be the number of cases where 0 = 0, and n' K g n, the 
number of cases where 0 = 0, and x falls into the interval xi{0,). The relative 


202 


n. VON MISES 


understood that, in this argument both r and d aie variables the values of whieh 
may change from one trial to the next I cannot agree with the statement, 
which is often made, that x only is a vanabh; and ti a constant or that we arc* 
only interested in one specified value of d. In no way is it possible, in the 
fiamework of the confidence limits method, to avoid the idea of a so-called 
supeipopulation, i e the existence of a manifold of urns every one of whieh forms 
a kollektiv" Thus no contradiction and rvo antagonism exists between this 
method and the Bayes formula Only a different kollektiv, a two-dimensional 
instead of a one-dimensional, is here considered 

I have no time to enter here in a discussion of the very inteiestmg develop¬ 
ments of Neymnn’s theory which are intended to supply additional conditions 
in order to determine the arbitrary choice of the ^-intervals m a unique way. 
May I only mention that what is called 111 Neyman's theory the probability of a 
second type error in testing the hypothesis d = is given by the expression 

(7) / / P(d, x) dxdd — / p 0 (d) dd / p(x | d) dx. 

J C , Jo Jx iCJo) 

If we want to determine the confidence belt oi the intervals 2 i , .r 2 in such a way 
as to minimize this expression independently of the function po(d), we obtain 
Neyman’s maximum power condition 

fXI (l>o) 

(8) / p(x | d) dx sa F(d, da) ~ min. for each pair d, da. 

This condition, it is well known, cannot be fulfilled under general assumptions 
for p(x | d). Moreover the above-mentioned boundary conditions Xi(C) — 
A(C) and xi(D) = B(D) (or similar ones in other cases) have to be considered 
too If they are not, satisfied, the statement which can be made with probability 
a would include the prediction that certain 2 -values arc* impossible. Except 
for this case the above formulated theorem is equally valid for every region 
determined according to (5) 

It is clear that if the original distribution is given by a regular, slightly vary¬ 
ing function p{x | d), the confidence limits method cannot give very substantial 
results Let us take e.g for p(x | d) the uniform distribution 

(9) p(x | d) = 1 /d for 0 ^ x £ d, 0 <f d <| 1. 


frequency of correct predictions is then (rq + nj + >• n ' k ): n where n equals Hi +■ n, q- 
If n tends to infinity, at least one part of the must become infinite, for thoHO 
the limit of n, in* tends to a according (5) while the other terms (with finite n, and n' K ) 
have no influence. So the limiting value of the frequency (rq + «' + n k ): n equals 
m any event a. I Ins generalization docs not apply, if we ask for the probability of a second 
type error of the hypothesis Here the existence of the prior probability u 0 is 

essential. 

B According to the generalization supplied by Neyman’s point of view (Phil Trans, 
Roy boc , Vol A-236 (1937), pp 333-380) which is discussed m footnote 5, the superpopu- 
lation does not necessarily satisfy the conditions of a kollektiv. 



PROBABILITY AND STATISTICS 


203 


We have here A = 0, B = C = 0, fl = 1 and the domain in which x and i? 
vary is the 45° right triangle shown in Fig 2 Whatever may be, the 

integral of p(i), x) = p n («?) ■ p(x | d) over this domain is 1 and if we omit the 
pait of the triangle on the left of the straight line x = (1 — a)&, the integral 
over the remaining part is a. For a = 0 90, a statement which can be made 
with a probability of 90% reads The value of d lies between x and 10m On 
the other hand we know from the very beginning with 100 % certainty that S 
lies between x and 1, so that for x A 0.1 the statement is futile. (If one chooses 
as confidence belt the part on the left of the straight line x = at'}, the statement 
would run - i? lies between 1.1 x and 1 and values of x greater than 0 9 are 
impossible.) If we apply in this case the Bayes formula, we find that the out¬ 
come depends to the highest extent on what is known about the prior prob¬ 
ability po($) 

In most cases however which present themselves in practical statistics the 
original density function p(x 11 ?) has a different character from that assumed in 



-*-k 

Fig. 2 


(9). It depends generally on an integer n and the distribution is concentrated 
more and more when n increases. (We may define here concentration as 
standard deviation tending towards zero. The integer n means in general the 
number of basic experiments). We have e.g. in the so-called Bayes problem 
where x is the arithmetical mean of n observations the asymptotic expression 
for p: 


( 10 ) 

0 I g 1, 0 ^ 

If we denote by $ the probability integral 

2 f* . 


-in (*-.>) Wl—U 


0 S ig 1, 


= f <T“ J 

\/ 7T Jo 


the ^-intervals corresponding to a given probability value a are defined by 


*i = — f, Xi = -i? + £ where $ ( £ 


2tf(l - »?), 


204 


K. VON MISBS 


If n has a large value, the £’s are very small and we get a narrow belt along the 
straight line x = t? as shown in Fig. 3 for a = 0.90 and n about 100. The 
prediction which can be made with the probability a reads approximately 

(13) x - v ^ 0 ^ * + v where 4> (n ^= «. 


On the other hand it is well known that in this case the Bayes formula supplies 
a posterior probability pj(d | x) which turns out to be more and more independent 
of the prior probability p 0 (d) when n increases. It has been shown that the 
asymptotic expression for p : (i? | x) whatever p Q (t?) may be, is 


(14) 


111 ~ 2»x(“- x) 


i n (iJ — x) J fx (I—*) 

G i 


It follows that, on the basis of the Bayes formula, we can predict for every 
single value of x with the probability a that d lies between the above given 





limits (13). This is more than the confidence limits method supplies, but the 
result is subjected to the restriction that p 0 (tJ) is a continuous function. How¬ 
ever, for large values of n (generally this means for large numbers of basic ex¬ 
periments) the outcomes of both methods are essentially the same. 

Let me recapitulate in three brief sentences the essential results wc have 
found in the problem of estimation, 

1 . There is no contradiction of any kind between the Bayes formhla and the 
confidence limits method and no difference at all in the underlying probability 
concept. In both methods the idea of a sort of "super-population’' is used. 
Only two different kollektivs are considered in both cases. 

2 . If the original distribution has a regular, slightly varying density function 
p(x 1 1 ?), the Bayes method gives a complete answer when the prior probability 
is known and no answer when it is unknown. The confidence limits method gives 
in both cases a definite solution; it lies in the nature of things that the solution 
cannot be very substantial if p(x, t?) is only slightly varying. 



PROBABILITY AND STATISTICS 


205 


3. If the original distribution p(x j t?) depends on a further parameter n and 
becomes concentrated more and more with increasing n, both approaches give, 
for large n, asymptotically about the same results. 

It is not intended by these remarks to impair the value of the confidence 
limits method which both from theoretical and from practical point of view 
deserves our attention. But the rather inconceivably aggressive attitude 
towards the Bayes’ theory as displayed by a number of statisticians, which, 
however, does not include J. Neyman, turns out to be completely unfounded. 



PROBABILITY AS MEASURE 


By J. L. Dooii 
University of Illinois 

The following pages outline a treatment of probability suitable for statisti¬ 
cians and for mathematicians working in that field. No attempt will be made 
to develop a theory of probability which does not use numbers for probabilities. 
The theory will be developed in such a way that the classical pi oofs of proba¬ 
bility theorems will need no change, although the reasoning used may have a 
sounder mathematical basis It will bo seen that this mathematical basis is 
highly technical, but that, as applied to simple problems, it becomes the set-up 
used by every statistician. The formal and empirical aspects of probability 
will be kept carefully separate. In this way, we hope to avoid the airy flights 
of fancy which distinguish many probability discussions and which are irrelevant 
to the problems actually cncounteied by either mathematician or statistician. 

We shall identify as Problem I the problem of setting up a formal calculus to 
deal with [probability) numbers. Within this discipline, once set up, the only 
problems will be mathematical. The concepts involved will be ordinary mathe¬ 
matical ones, constantly used in other fields, The wouls "probability,” 
"independent,” etc will be given mathematical meanings, where they arc list'd. 

We shall identify as Problem II the problem of finding a translation of the 
results of the formal calculus which makes them relevant to empirical practice. 
Using this translation, experiments may suggest new mathematical theorems. 
If so, the theorems must be stated in mathematical language, and their validity 
will be independent of the experiments which suggested them. (Of course, if a 
theorem, after translation into practical language, contradicts experience, the 
contradiction will mean that the probability calculus, or the translation, is 
inappropriate.) 

The classical probability investigators did not separate Problems I and II 
carefully, thinking of probability numbers as numbers corresponding to events 
or to hypothetical truths, and always referring the numbers back to their 
physical counterparts, The measure approach to the probability calculus has 
put this approach into abstract form, and separated out the empirical elements, 
thus removing all aspects of Problem II. We shall explain this approach first 
in a simplified set-up, that which will be made to correspond (Problem II) to a 
repeated experiment m which the results of the nth trial can be any integer x„ 
between l and N (inclusive), in which the experiments are independent of each 
other, and performed under the same conditions. (The set-up will be applicable, 
for example, to the repeated throwing of a die.) 

206 



PROBABILITY AS MEASURE 


207 


The measure approach treats this experiment as follows. Let co: {x\ , x 2 , • • • ) 
be any sequence of integers between 1 and N, inclusive We consider to as a 
point in an infinite dimensional space 12 (Each point w may be considered as a 
logically possible sequence of results of the given experiment, and this fact will 
guide us in solving Problem II) A measure function is defined on certain sets 
of points of 12 as follows. Let pi, • • ■ , pv be any numbers satisfying the 
conditions 

PjSO, j S 1, pi + • • ■ + Pat = 1 . 

(How these numbers are chosen in any particular problem will be explained 
below. The method of choice is irrelevant to the mathematics, but is involved 
in the solution of Problem II.) The set of all sequences beginning with ri = a 
is given measure p a . More generally, the measure of the set of all sequences 
beginning with X\ = a*, • • - , *„ = a n , is defined as p a rPa 2 ■ > ■ p a „ . In this 
way, as can be shown, 1 a completely additive measure function is determined 
on certain point sets of Cl, on a field 55 of sets so large that all the usual Lebesgue 
measure and integration theory is applicable. This means that there is a col¬ 
lection 55 of sets of points of Cl such that if • are finitely or infinitely 

00 00 

many sets in the collection, their sum £) S n , their intersection JJ S n , and 

i i 

their complements are also in the collection Each set S in 55 has a definite 
measure P(S), 0 2s- P(S ) A 1, and if S\, S 2 , • • • are finitely or infinitely many 
disjunct sets in 55, 

P(S! +&+•••) = P(Sf) + P(Sf) + • • • . 

Problem II, the translation problem, is solved as follows. Each relevant 
event is made to correspond to a point set of Cl. A relevant event is a physical 
concept—defined by imposing some set C of conditions on the results of the 
experiments. The corresponding 12-set is the set of sequences (x \, x 2 , • • • ) 
satisfying the same set C of conditions, imposed on the x,. Thus the set of all 
sequences beginning with xi = ai, xt = at, is made to correspond to the event. 
the result of the first experiment is a : , of the second is aj. As is to be expected, 
the mathematical picture goes further than the real one The “event” 1 occurs 
infinitely often m a sequence of trials has only conceptual significance, physically, 
but the corresponding point set of 12. the set of all sequences (xi, xi, • > ) con¬ 
taining infinitely many l’s, is a perfectly definite point set whose measure can 
be calculated in terms of pi, • • • , . (In fact it is easily seen that this 

measure is 1 or 0, according as pi > 0 or pi — 0.) By “the probability of an 
event” we shall mean the measure of the corresponding 12-set. As this measure 
has been defined, the probability that the nth trial results in a number j is p ,, 
and the probability that one trial results in j, and another in k, is Pj-Pk . 

1 Cf A. Kolmogoroff, Ergebnisae der Malhemaltk, VoL. 2, No. 3, Grundbegriffe der Wahr- 
achemlichkeitsrechnung, where the moat complete treatment of the approach to the proba¬ 
bility calculus from the standpoint of measure is given. 



208 


3 . I.. DOOB 


The justification of the above correspondence between events and O-sets is 
that certain mathematical thcoiems can be proved, filling out a picture on the 
mathematical side which seems to be an approximation to reality, or rather an 
abstraction of reality, close enough to the real picture to be helpful in prescribing 
practical rules of statistical procedure, The following two theorems are im¬ 
portant ones, from this point of view, These two theorems depend in no way 
on observed facts. They arc stated and proved in the customary language of 
modern analysis. 

Theorem A' Let j„ be the number of the first n coordinates of the point 
■•• ) which are equal to j, where; is some integer (1 g j | N) which 
will be kept fixed throughout the discussion, Then 0 I j» ^ n, and j n varies from 
point to point on £l:j n = j\v(w) is a function of w, that is of the sequence (®i, , ■ * • ). 

When n —^ oo, j„/n has not a unique limit independent of the sequence 
• ) under consideration. In fact if u is the point (k, k, . *. ),;„(«) = 0 
for all n, unless j = k] if u is the point (;, j, • • • ), = n for all n. It is 

simple to give examples of sequences w:(;ki ,**,•■• ) for which ;„(«) oscillates 
without approaching a limit, as n —► oo . But Theorem A (usually called the 
strong law of large numbers) states that there is a set of sequences, i.e. an co-set S, 
of measure 0, such that 


( 1 ) 


lim 


jn(<*>) _ 

—- Pit 

n 


unless <o is in <S. In other words the sequences for which (1) is not true are 
exceptional in the sense of measure theory. If a new choice { p \) of p,-'n is made, 
then if Pj t 6 V,, the new exceptional set includes all the sequences which were 
not exceptional before, since the limit in (1) becomes pj . Thus S depends 
essentially on p ,-. Theorem A is a generalization of Bernoulli's classical theo¬ 
rem which states in our language that the measure of the set of sequences 
w: (si, x 2 , •. • ) for which 

| jn(o>)/n - Pi I > e 

approaches 0, as n —» «, for any positive e. Theorem A is stronger because it 
states that there is actual convergence, whereas Bernoulli’s theorem only con¬ 
cludes that there is a kind of convergence on the average. 

Theorem A corresponds to certain observed facts, relating to the clustering 
of “success ratios,” giving rise to empirical numbers pj, If the statistician 
wishes to apply his calculus to a given experiment (Problem II), he sots pi = . 

There has been frequent discussion of the problem of determining the Pj. 
This discussion of the pj is sometimes held on so high a plane that the innocent 
bystander may wonder to what purpose such abstract philosophic concepts could 
possibly be put—besides that of stimulating further discussion on a still higher 
plane. The principle purpose of this paper is to discuss Problem I, but a few 
words on Problem II might not be out of place here. Almost everyone who is 
going to use probability numbers, the p, , for other than conversational purposes. 



PROBABILITY AS MEASURE 


209 


derives them in the same way. There is a judicious mixture of experiments 
with reason founded on theory and experience. Thus if a coin is tossed by an 
experimenter who has examined the coin, and found that it had heads on one 
side but not on both, that it seemed balanced, and that (as a confirming check) 
tossing a hundred times gave around 50 heads, the experimenter would use § 
as the probability of obtaining heads in his further reasoning. Of course there 
is no logic compelling this, The experimenter may have been fooled. A coin 
far out of balance may turn up 50 heads in 100 throws. But man must act, 
and the above procedure has been found useful, which is all that is desired In 
many experiments, less reliance can be placed on a preliminary physical examina¬ 
tion of the experimental conditions, and more must be placed on the actual 
working out of the experiment, as in the analysis of machine products. In that 
case, the actual results must be examined with great care, before attempting 
to use the above mathematical set-up. It sometimes may even be possible to 
change the experimental conditions to make the mathematics applicable. 2 In 
all cases, such mathematical theorems as Theorem A and the following Theo¬ 
rem -B give the basis for applying the formal apparatus to practice. Indeed, 
the criterion of application includes the verification of special cases of the prac¬ 
tical versions of Theorems A and B 

Theorem B: Let /„(xi, • • > , x«_i) in > 1) be any function of the indicated 
variables, except that we suppose/,, only takes on the values 0,1 Letco: (xi, is, • < •) 
be a given point of ft Let n' be the number of the first n integers i such that 
fi(x i, • •. , x,_i) = 1, and let j' n be the number of the first n integers i such that 
f,(x i, • ■ • , x,_i) = 1, and Xi = j. Then j' n , n' are functions of «: (*j, , • ■ ■ )■ 

If /, == / 2 == ... = ], /„ = j n , n' = n, where j n is as defined above. Suppose 
that there is an ft-set So of measure 0 such that n* —► <*, as n —> w, unless cc t S 
Theorem B states that theie is then an ft-set S' of measure 0, such that if 
o>:(zi, xi, • ■ ■ ) is not in S', 


do 


lim 


j'n(co) 

n' 


= Vi- 


(The set S' will depend on the given functions f \, / 2 , ■ ■ • and on the p,, but is 
fixed, once these have been chosen.) This mathematical theorem corresponds 
to certain observed facts (usually summarized by stating that no (successful) 
system of play is possible). In fact, it states, in the language of practice, that 
rejecting certain trials, using as a criterion of acceptance or rejection the results 
of preceding trials, rejecting the fth trial if f,(x i, ■ • • , x,_i) = 0, does not affect 
the outcome of a game of chance, or, more precisely, does not affect the validity 
of the physical fact corresponding to Theorem A. If /i = fi = > • • = 1, (!') 
becomes (1). The hypothesis that n' —► * as n —» » unless o> < S 0 is made to 
insure that infinitely many trials will be accepted. As an example of the 


1 Cf. W. A. Shewhart, Statistical Method from the Viewpoint of Quality Control, Wash¬ 
ington, 1939. 



210 


J. L. DOOB 


possible variety m the definition of the /,, we might define /, as 1 if x,_i = N, 
and /, = 0 otherwise, so trials are accepted only if the previous trial resulted in 
the number N. Or much moie complicated systems can easily be devised in 
which the criterion of acceptance of the nth trial depends on a varying number 
of the results of preceding trials. This theorem gives a mathematical counter¬ 
part to the physical idea of the mutual independence of repeated trials 

To summarize, mathematically {Problem I) the studv has been reduced to 
that of the measure properties of £2. This can be considered independently of 
any physical correspondence. The physical correspondence (Problem II) makes 
any event @ correspond to a point set E of £2, the “probability of ®” becomes 
the measure of E. Thus “the probability that the result of the first experiment 
is 3“ becomes the measure of the set of sequences (xi, xa, ■ ■ • ) beginning with 
Xi = 3. We have given no sharp definition of probability as a physical concept. 
If the above mathematical set-up, after translation, using some set of p/ s, 
seems to fit a given physical set-up, any event will be said to have as its proba¬ 
bility, the measure of the corresponding £ 2 -set. We have attempted to give no 
intrinsic a priori definition of the probability of an event: such a definition is 
quite unnecessary for our purposes. All that was required was a basis for pre¬ 
scribing the usual statistical procedures, and we have described such a basis. 

In the above example, there would have been no new difficulty introduced 
if the x„ were not restricted to integral values, but allowed to take on any 
numerical values. The general point «: (xi , x t , ■ - ■ ) of £2 would now be any 
sequence of real numbers. Instead of choosing the numbers pi , ■ ■ ■ , p„ we 
choose a “distribution function” P(x), a monotone function with the following 
properties. 

lim Fix) =0, lim Fix) = 1, F(x — 0) = F(x). 

x—*—qo x-*-\-oo 

Measure on £2 is defined as follows. The set of all sequences beginning with Xi 
such that a g x x < b is given measure F(b) — F(a). (The number F(b) is 
called “the probability that x 2 < b .”) More generally, the measure of the set 
of all sequences , x 2 , • - ■ ) beginning with 'Xi, ... , x„ , such that a, g 
< b i , 3 = 1, • • • , n is defined as n [P(b>) - P(a,)]. Thus if F{x) defines a 

simple rectangular distribution: F{x) = 0 for x < 0, F(x) = x for 0 g x g 1, 
F{x) = 1 for x > 1, £2-meaaure becomes (infinite dimensional) volume in the 
(infinite dimensional) unit cube. The correspondence (Problem II) between 
events and point sets of £2 is defined just as before. Sometimes it may be useful, 
in considering experiments giving rise to paiis of numbers, to let each x„ be a 
pair of numbers so that £2 becomes a sequence of points of a plane instead of a 
sequence of points of a line. In all cases there are mathematical theorems 
true of the resulting £2 which guide us (Problem II) in deciding just how the 
£2-measure is to be defined, that is, how F{x) is to be defined, in dealing with a 
given practical problem. But the essential point is this. Once £ 2 -measure has 
been defined, no changes or further hypotheses are possible or necessary. All 



PROBABILITY AS MEASURE 


211 


relevant probability questions are answerable. Thus consider a question of the. 
following type if the experiments are grouped in some way , 3 with what proba¬ 
bility will the groups have some given regularity property ? 4 The question singles 
out a set E of sequences of £2 and asks: what is the measure of El The problem 
may or may not be difficult mathematically , 5 depending on the grouping, but 
the original definition of measure on £2 needs no enlargement to answer it. 

Technically, the mathematics has become the mathematics of a special type 
of measure defined on a space of infinitely many dimensions. If, however there 
is an integer u such that only at most v experiments are to be consideicd, we 
need only consider the r-dimensional space of points (si, ■ • • , x y ), defining 
measure in this space in the same way as on £2. Thus if x n has the rectangular 
distribution defined above, the measure in (xi , • • • , x,)-space becomes ordinary 
v-dimensional volume in the unit cube. Perhaps the most common measure a 
statistician considers is that in which the measure of an (®i, • ■ • , z,)-set E 
becomes “the probability that the point (xi , • • • , x r ) representing an inde¬ 
pendent sample of v from a normal distribution of mean 0 and variance tr 2 ” 
will lie in E‘ 

(2) P{E] = <T'(2, t) - *' / • • •/ ' ‘ + ^ )/ff5 d Xl .. - dx t . 

This example makes it obvious that the statistician is always doing measure' 
theory, even though he may not state that fact explicitly. If the number of 
experiments has no upper bound conceptually—mathematically when the num¬ 
ber of dimensions v may increase without limit, as in Theorems A, B, it is much 
more convenient to use the space £ 2 , in terms of which experiments with varying 
numbers of trials can be considered simultaneously, The classical proofs of 
probability theorems, such as Bernoulli's theorem (the law of large numbers) 
are perfectly correct If the “probability of an event" is interpreted as the 
measure of a set, these proofs do not even need verbal changes. There can be 
no question of the need for any axiomatic development beyond that necessary 
for measure theory, and the probability calculus can lead to no contradiction, 
unless the theory of measure is faulty. 

It is customary for probability theorists to stop their discussions when the 
present stage is reached, so that the beginnings of a foimal calculus have been 
constructed to deal with a repetition of independent experiments, conducted 

3 A grouping is necessary, for example, when two players are playing a game in which 
two out of three wins in the trials win a game, The trialB are then grouped into successive 
groups of two or three, depending on how they come out 

4 Continuing the preceding note, the question might be- will the ratio (games won by 
player a)/(games played) approach a limit with probability 1, that is, for all of the original 
sequences {*„! except possibly some forming a set of measure 0? 

8 The answer to the question of the preceding notes is simple. If p is the probability 
that player a wins a tnal, the ratio in question approaches p 3 + 3p’(l — p), the probability 
that a wins a game, with probability 1 



212 


3 . L BOOB 


under the same conditions. Perhaps this is because of the following widely held 
syllogism; probability is something dealing with random events; random events 
are events having no influence on each other; therefore.... Unfortunately 
mathematicians and statisticians must deal with many problems involving de¬ 
pendent probabilities, whose solutions require the most delicate and careful 
applications of modern analysis. The rudimentaiy calculi which the outsiders 
find esthetically or philosophically pleasing arc usually either insufferably awk¬ 
ward or completely insufficient for the needs of professionals. There is a strange 
situation, which one observer has facetiously described somewhat as follows: it 
is true with probability 1 that the technical workers in probability use the 
measure approach, but that the writers on “probability in general” descendants 
of Carlyle's professor, do not consider this approach worth much more than a 
passing remark. 6 The following pages outline how our previous treatment is 
generalized to deal with problems in which it is desirable to have the distribution 
of Xj vary with j (so that physically the experiments are no longer tlie same), 
and in which the x, do not have to correspond to the results of independent 
experiments. Some attempt will also be made to show how the modem mathe¬ 
matical theory of real functions is applied to the probability calculus. 

Let Xj = x,{<p) be the jth coordinate of the point «:(*i, x 2 , • ■ ). Then as 
the sequence w: (*i, xi , ■ • • ) vanes, x, does also' x,(oi) is a function of w. The 
functions x r (co), %(«), • • • are functions defined on 12, an abstract space on which 
a measure has been defined. Moieover O-measure has been, defined in such a 
way that the 12-set for which x,(u) < K (j, K fixed) is an 12-set whose measure 
has been defined (This set is composed of all sequences (xi, x 2 , • • • ) whose 
jth coordinate is <K, and the measure is F(K), using our last definition of 
12-measure.) In the terminology of measure theory, x,(u) is thus a measurable 
function. The study of the measure relations of 12, and this is the whole of our 
probability calculus, can be considered, from this point of view, as the study of 
the properties of a sequence of measurable functions, one with very special 
properties, as we shall see, defined on some space. A measurable function 
defined on 12 is usually called a chance variable, in the theory of probability. 
(This terminology is somewhat dangerous, because it mixes Problems I and II.) 
The whole apparatus of modern real variable theory is applicable to these 
chance variables. Thus if f(x) is a chance variable (measurable function of w) 
(physically, a function of the observations), it is customary to define a number 
called its expectation. This number is simply the integral of /(a>), with respect 
to the given 12-measure. The fact that the expectation of the sum of two chance 
variables is the sum of their expectations is simply the familiar theorem that the 
integral of the sum of two functions is the sum of their integrals. Let 8(j, K) 
be the O-set defined by the inequality x, < K. Up to now we have supposed 

* This analysis, like every other probability statement, is only an approximation to 
reality, but a fairly close one. 



PROBABILITY AS MEASURE 


213 


that the measure of S(j, K ) is independent of j, that is that the distribution of x, 
is independent of j. We have also supposed that 7 

(3) PlSd.Xi) S(n,K n )} = P{S(1,K0) ■ ■ ■ P{S(n, K n )} 

for any positive integer n, and numbers Ki , . • • , K n . That is, we have sup¬ 
posed that Xi (w), Xi(u), arc mutually independent chance variables 8 In 
fact probability measure on ft has been defined just to make the foregoing two 
facts true. Mutual independence is a very strong hypothesis to impose on a 
sequence of functions In many probability problems (Markoff chains for 
example), more general measures must be defined on ft. The sequence Xi(u), 

• whose properties are those of ft-measure, is then no longer a sequence 
of independent functions, and the distribution of x, can vary with j. 

At this level, the study becomes the study of any sequence of measurable 
functions, defined on some space of total measure 1, If /, g are given chance 
variables, they may turn out to be independent. In that case the theorem that 
the expectation of their product is the product of their expectations becomes, 
when translated into mathematical language, the familiar theorem that 

J f f(x)g(y) dxdy = J f(x) dx J g(y) dy. 

The mathematical theorems are not simply analogues of the probability theo¬ 
rems—they themselves are those theorems. When stated mathematically, the 
probability theorems need no proof.' they need only recognition as standard 
results. 

Empirical needs suggest that certain functions called conditional probability 
distributions, and conditional expectations, should be defined in a certain way. 
This is possible, as a formal matter, 9 and the theorems then proved about these 
functions gives them their usual meaning when translated into practical language. 
These functions are extremely useful tools m dealing with mutually dependent 
(that is not independent) chance variables 

The above approach is easily generalized to the stage needed in the study of 
Brownian movements or of time series, in which, instead of the proper initial 

7 P(iS) was defined as the measure of the U-set /S. 

9 The n chance variables /i(w), / 2 (w), • • ■ , /„(w) are said to be independent if for every 
set of n numbers Ki , ■ • • , K n , the following equality is true. 

P (//(«) <Kt, i “ 1, • • , n) - I[P(/i(w) < K{}, 

where P| • • • ) denotes the n-moasuro of the ft-set defined by the conditions in the braces. 
Thus in the example of a normal distribution in v dimensions given above, xi , • • ■ , xv 
are independent functions on the space of v dimensions, a fact which follows readily from 
the fact that the p-dimensional density function is the product of v functions of the separate 
variables, 

9 Cf. Kolmogoroff, loo. eit 



214 


J. L. DOOB 


abstraction being a sequence {z n ) of numbers, we have a one-parameter family 
[x t ] (t takes on all real values). The number x t may, for example, be thought 
of as the ^-coordinate of a particle at time t. There is no difference in principle 
here: SI is now the space of functions of t, instead of the space of sequences, that 
is functions of n. From the other point of view, instead of studying the proper¬ 
ties of a sequence of measurable functions, it becomes necessary to study the 
properties of a one-parameter family of measurable functions. 



DISCUSSION OF PAPERS ON PROBABILITY THEORY 

By R, von Miseb and J. L. Doob 

1, Comments by R. von Mises. Professor Doob outlines a new theory of 
probability starting with the following three basic conceptions. First, he uses 
the notion of an infinite sequence of trials or better, of an infinite sequence of 
numbers x lt X 2 , x 3 , • • • which can be considered as the outcomes of infinitely 
repeated uniform experiments Second, he introduces (in his Theorem A) the 
limit of the relative frequency of a particular outcome a Third, (in his Theo¬ 
rem B) the notion of place selection defined by a sequence of functions 
f n (x i, X 2 , - ■ • x n -i) is employed. All these three concepts are completely 
strange to the so called classical theory as developed by Bernoulli, Laplace, 
Poisson, etc They have been introduced and made the corner stone of proba¬ 
bility theory in my papers published since 1919 I daresay that in no probability 
investigation before 1919 any of those notions even were mentioned. 

This concerns what Professor Doob calls the Problem I or the purely mathe¬ 
matical aspect of the question. As to his Problem II or the relationship between 
the formal calculus and leal facts Professor Doob stresses that the actual values 
for probabilities that enter as data into a particular argument have to be drawn 
from long, finite sequences of experiments This is in complete accordance 
with the standpoint of my theory and in strict contradiction to the classical 
conception which knows only “a priori” probabilities determined by “equally 
likely cases.” 

In both theories, Professor Doob’s and mine (not in the classical) a mathe- 
thematical model or picture is associated with a long sequence of uniform 
experiments. These models are different in both theories My model (the 
“kollektiv”) consists of one infinite sequence o>: ®i, x 2 , x s , • • • in which the 
limit of the relative frequency of each possible outcome a exists and is indifferent 
to a place selection; the value of this limit is called the probability of a. . 

On the other hand Professor Doob’s model implies all logically possible se¬ 
quences which form a space Si and he shows that in this space a measure function 
can bci introduced which fulfills the following conditions; (1) If m is a positive 
integer, the set of all sequences the mth element of which is a has a measure p a 
independent of m; (2) the set of all sequences in which the relative frequency 
of a-results has either no limit or a limit different from p a is zero; (3) if S is any 
place selection, the set of all sequences u for which the relative frequency of a 
in S (oj) has either no limit or a limit different from p a is likewise zero; this value 
p a is called the probability of the outcome a. It then can be shown that a 
probability in this sense can be ascribed to certain events, i.e. to certain types 
of experiments which in some way are connected with the sequence of basic 

215 



216 


R. VON MI8ES AND J. L DOOR 


experiments. E.g. if the original sequence consists of the single successive 
tossings of a die, the derived sequence may consist of pairs of tossings with the 
sum of the outcoming points as new value of a. The new probabilities p' a are 
found as measures of certain sets in the original measure system established in fi. 

There is no doubt that the model used by Professor Doob for representing 
empiiical sequences of uniform experiments is logically consistent. Its practical 
usefulness depends on how the usual problems of combining different kollektivs 
and so on can be settled within this scheme. This has to be shown in detail. 
It seems to me that my conception is simpler in its application and closer to 
reality, while his model may be considered more satisfactory from a logical 
standpoint since it avoids the difficulties connected with the concept of "all 
place selections.” At any rate, however, there is no contradiction or irrecon¬ 
cilable contrast: both theories are essentially statistical or frequency theories, 
equally far from the classical conception based on "equally likely cases.” In 
both theories probabilities are, of course, measures of sets. 

2. Comments by J. L. Doob. It is perhaps unfortunate that Professor von 
Mises' treatment of probability problems, based on typical sequences ("collec¬ 
tives,” “admissible numbers”), is commonly called the “frequency theory.” 1 It 
is clear to any reader of our papers (identified as M and D below) that the idea 
of frequency, at least in the discussion of the relation of mathematics to prac¬ 
tice, is no more fundamental to one approach than to the other. In one mathe¬ 
matical treatment frequency notions first appear in the theorems, whereas in 
the other they first appear in the axioms; but they appear in both. The principal 
objection the measure advocates have to the frequency approach is that it is 
awkward mathematically. Anyone who doubts this awkwardness need only 
examine various books published recently, using this approach, to see what a 
lot of fussy detail is involved merely in proving such elementary results as the 
Tchebycheff inequality or the Bernoulli theorem. One author considers it neces¬ 
sary to have his chance variables so restricted that if a; is a chance variable, the 
event x < k has a probability assigned to it only if k is not in some exceptional 
set, which may be infinite. To take another example, consider the coin tossing 
game discussed in both M and D, in which two out of three wins at tosses win 
a game. Apparently the probability analysis of this game is somewhat difficult 
in terms of the frequency theory. As the quite elementary treatment outlined 
in D shows, there is no difficulty involved, using the measure approach. The 
question is simple: a set of chance variables is given (corresponding to the 
original tosses); a new set is determined from them (corresponding to the 
grouping into games). Only elementary algebraic manipulation is required to 
verify that the new chance variables are mutually independent in the mathe¬ 
matical sense, (Cf, D), and have the same distribution, so the law of large 
numbers is applicable. Professor von Mises considers that the measure theory 
cannot handle this problem. I on the other hand consider that this problem 
exhibits the mathematical disadvantages of the frequency theory. 


1 This identifying name will be used below also. 



DISCUSSION ON PROBABILITY THEORY 


217 


The frequency theory reduces eveiything to the study of sequences of mutually 
independent chance variables, having a common distribution. “Probability 
theory is the study of the transformations of admissible numbers” writes Pro¬ 
fessor von Mises. This point of view is extremely narrow Many problems of 
probability, say those involved in time series, can only be reduced in a most 
artificial way to the study of a sequence of mutually independent chance vari¬ 
ables, and the actual study is not helped by this reduction, which is merely a 
tour de force. 

It is claimed in M that the axioms of measure theory only describe the distri¬ 
bution within one collective (M, p. 00). This statement seems to mean that 
only the measure relations (using the notation of D) of the first coordinate 
function Xi(u ) can be discussed m the measure theory, that is only probabilities 
of the type: the probability that x x < k (in the language of practice, “the 
probability that the result of the first experiment is less than k ”) are discussed. 
Actually, however, (Cf. D) the measure theory can discuss any number of ex¬ 
periments simultaneously, using the appropriate space fi 

Many of the debates between the advocates of the various probability theories 
have been wasted, because some of the debaters talk mathematics, others physics. 
With this in mind, I should like to stress again 2 that (except for a few philo¬ 
sophically inclined Englishmen) everyone calculates probability numbers m the 
same way—a combination of reasoning based on experience and helped by 
theory, with examination of the experimental conditions and the results of trials. 
Frequency considerations necessarily play a large part The fact that almost 
everyone calculates probability numbers in the same way docs not alter the 
fact that one mathematical theory may be more useful or convenient than 
another in dealing with these probability numbers. 

In closing, it seems proper to call attention to what the measure advocates 
consider the real services and contributions of the approach of Professor von 
Mises. Professor von Mises was the first to stress the importance of the second 
of two fundamental generalizations of experience in dealing with repeated mu¬ 
tually independent experiments of the same character: (1) the clustering of 
success ratios and (2) the fact that this clustering is unaffected by a system of 
rejection as described in M and D. These two generalizations of experience are 
certainly fundamental. The only point under discussion here is how such gen¬ 
eralizations are to be put into a mathematical setting. The original such setting 
of Professor von Mises was criticized as not really mathematical. The setting 
now proposed by Copeland and others is criticized by the measure advocates as 
mathematically inflexible and clumsy. But it is significant that even in a treat¬ 
ment of the measure approach, as in D, it was felt essential to stress the mathe¬ 
matical interpretation of the two empirical generalizations of Professor von 
Mises In the terminology of D, the measure advocates consider the contribu¬ 
tion of Professor von Mises’ approach to be a contribution to a solution of 
Problem II, not to Problem I, the mathematical problem. 


1 We are not talking mathematics now, but the application of mathematics 



CONTINUED FRACTIONS FOR THE INCOMPLETE BETA FUNCTION 1 


By Leo A. Ahoian 
Hunter College 


1. Introduction. Existing literatim* on tlic pi obi cm of calculating the in¬ 
complete Beta function 

(1.1) B*(p, q) = f a^fl - a;)* -1 dx, 0 < * < 1, p > 0, g > 0, 

Jo 

and the levels of significance of Fisher’s z [1] leave further work to be done. 
Muller’s continued fraction and a new continued fraction are shown to possess 
complementary features covering the range of BAp, q) for all values of x, p, q. 
Previous methods of computing I x (p, q) = IL(p, r/)/'Rfp, q) are given in [2], [5], 
[6], [8], [10], [13], [14], [15]. 

Midler’s continued fraction is 


( 1 . 2 ) 

where 


.(p,s) = c[ii 


b 2 5a tu 
i+ i+ i+ i+ 


r _ “b q) r v(i \s-i l _ -I 

c ~ HpTim ( } ’ 1-1 


q ~ 8 

M» = - , 

p + 8 


bii — 


(p 4 - 3 - 1 )(p + s) X 

(;p + 2s - 2)(p + 2s -1) 1 — x* 


fyu+i — 


s{p + q + s) 


(p + 2s - l)(p -f 2s) 1 ~ x' 

CO 

A convergent infinite series 1 + d„a;" can he converted into an infinite con- 


n—1 


tinued fraction of the form rr rr rr * ■ • where [4], [9] p. 304, 

1 + 


(1.3) 


Cl = -ft, 


Cu 


-ft 

ft"’ 


C2, — 


— ft,—sfta 
fti-sft,-i ’ 


Ci»+1 — 


" fti—Stfta+1 

ft«-ift, } 


8 > 2 


1 Presented at a meeting of the American Mathematical Society, October 2S, 1939. New 
York City. 


218 



CONTINUED FRACTIONS 


219 


where 



i 

di 

* 

• d. 


di 

di 

d 3 - ■ 

d.+i 

[I 

to 

di 

& 

da 


I $28+1 = 

d* 

di 

d 4 ■■ 

d»+2 


d. 

d»+i 


* dia 


dj+i 

dg+2 

di+a • • 

d 2 .+i 


Pie 7 ^ 0, $21+1 7 ^ 0. 


The infinite continued fraction found in this manner is called the corresponding 
continued fraction and the power series is said to be semi-normal if $ 2s ^ 0, 

$2l+l 7^ 0. 


2. A new continued fraction. Milller found his continued fraction by con¬ 
verting in the manner of the preceding paragraph 


hip, g) 

(2.1) 


r(p + q)x v { 1 - s)*- 1 

r(p + i)r(g) 

f, ,y (g ~ 1) (g — 2) • • • (g — r — 1) / x \ r+1 \ 
'I ^ S(p + l)(p + 2) (p + r + 1) \1-*/ /’ 

x < 


We convert 


hip, q ) 

( 2 . 2 ) 


r(p + g)x p (l - x y 
r(p -f l)r(g) " 

11 I y' ip + g) ip + g + l) 

J \ (p + l)(p + 2) 


• ■ (p + g + r) ,+d 
(p + r + 1) /’ 

0 < x < 1. 


Consequently 

_ p + q jp + g)(l - g) 

Pl p+ V Pl (p + l)*(p + 2) > 

ft — iP + g)(P + g+ !)■■■(? + g + s - l)(p + g + s ) 

P2 * +1 (p + s + l)(p + a + 2) ... (p + 2a)(p + 2s + 1) ~ Pu> 

a _ (1 “ g)(2 - g) • • • (s - g)(« + 1 - g)(s + 1) 1 a 

(p + l)(p + 2). ■ T (p + 2s + l)(p + 2a + 2) P2,+1 ’ 

_ . (p + a) jp + q + a) _ sjq - s ) _ 

2,+1 (p + 2s) (p + 2s + 1) ’ 2 “ (p + 2s - 1) (p + 2s)' 


t ( - r(p + g)x p (i - x) g / J_ Ci_ Ci_ \ 

Ap,q} Tip + l)r(g) \i+i+i+'‘7' 


and 

(2.3) 



220 


LEO A. AROIAN 


whore C, = c,x. By well known theorems due to Van Vlock [12] find Perron 
[9] p. 347 we find (1.2) eonveiges for -1 < x < and (2.3) converges for 
— oo < ,r < 1, and in the neighborhood of zero (2.2) equals (2.3). The region 
of equivalence of the series and the fraction may he extended by the, following 
argument Let the infinite series be terminated at some arbitrary point which 
gives the desired accuracy. Then the continued fraction of the, corresponding 
type represents this finite series, is finite and gives the result within the desired 
accuracy, The new continued fraction may also be derived by use of the hyper¬ 
geometric series [9] p. 348. A special case of (2,3) was given by Markoff [3], 
pp 135-41, [11] pp. 53-55, who applied the result only to the binomial distribu¬ 
tion. The associated continued fraction provides more rapid convergence than 
the corresponding continued fraction The associated continued fraction is 
found by means of the hypergeometne series [9] p. 331, p. 348: 


h(p,q) 


(2.4) 


It-H 


Tip + g)* p (l - xy f, . hx hx z Ifcsm 2 \ 

T(p + l)r(g) \ l + k*+ 1 + hi+ 1 + hx+ ‘ " J 

7, _ P + <7 , _ P + g + l 

fcl ~ jT+r ~ "p + r ■ 

k _ s(s — q)jp + s)ip + q + s) 

,+1 ip + 2a - l)(p + 2sY(v + 2a + 1)' 

_ siq - s) _ _ [p + s+l)ip + q + a + 2) 

ip + 2s)(p + 2s + 1) (p + 2s + 2)ip + 2s + 1) 1 


The disadvantage, of (2.4) lies in the unwieldy form of computation. For prop¬ 
erties of an associated continued fraction and the corresponding continued 
fraction in connection with convergence and the Taylor series reference is made 
to [9] p. 331 and pp, 302-303. 


3. Properties of the corresponding continued fraction. Mtiller and Soper 
[5], [10], pointed out the inadvisability of integration through the mode x = 
© “ 1 

—7 -n ■ In such cases we change hip, q) to h- x {q, p). Mtiller has shown 

p + q — 2 

for his continued fraction that if we do not integrate through the mode (we 
assume this in the remainder of the paragraph) that convergent 2, 3, 0, 7, etc., 
will be greater than the true value and the remaining convergcnts will be less 
than the true value provided q is an integer. However, if q is not an integer, 
and is small (g < 20), it may happen that all convergent are above the true 
value. In such cases we may consider whether Mfiller’s continued fraction may 
apply by estimating the remainder I(p + s, q — s), after s reductions by part 
[ 10 ]. 

For the new continued fraction also 


C,.\ 


I C2.+1 1 


_ sjq - s) _ p - 1 

(p + 2s - l)(p -f 2s)‘p + q — 2 

(p + s)(p + q ri- «)(p - 1) 

(p + 2s) (p + 2s + 1 )(p + g-2) 





CONTINUED FRACTIONS 


221 


and C 2 , + i < 0; C 2j > 0 unless s > q when C 2 , < 0. If C 2 . > 0 then the con- 
vergents 2, 3, 6 , 7, 10, 11, etc., will be above the true value and the other con- 
vergents will be below the true value If C 2 , <0, then all eonvergents will be 
above the true value. In such cases, since a remainder for the continued frac¬ 
tion has not been found, it seems best to estimate I x (p + s, q — s) to obtain 
an idea of the error. 


4. I x (p + a, q — s) and the equivalent continued fraction. Soper [10] has 
given the remainder after s reductions by raising p. This will furnish an upper 
bound of the error in the corresponding continued fraction after s eonvergents. 
The remainder, when q — s is a negative integer, is approximately 

I*(p + a, q — s) 

_ 2 sin ( q - sWf(£ - l)/ 2 ?r(p +~g) J7*\Y 1 ~ x V~ f \ p+g 

(4.1) i - * \\u Vi - i/ / 1 

where £ = ? 8 . 

P + <Z 

Another approach is to use the equivalent continued fraction, for s — 1 con- 
vergents of the equivalent continued fraction reproduces exactly a terms of the 
infinite series. The infinite series and the equivalent continued fraction for the 
infinite series are alike in all respects except form. By [9] p. 210, we find that 
the equivalent continued fraction for (2.3) is 


Wx = 


7i 


72 


73 


74 


1 ■+■ 7i— 1 + 72— 1 4 " 7a~ 1 ~t~ 7<— 


where 


(4.2) 


and 


P+q p+q+l p+q +2 

= ^r+“i 1 y '~-j+2~ x ’ P + 3 

rw . »+x+ : -\ 

P + T 


h(p, q ) = 


r(p + g)z p (l - xy 1 


r(p + i)r(g) l - Wi 


The equivalent continued fraction for Mtiller's continued fraction is given in 
[5], p. 292. 


6. Numerical illustration. If A v and B v represent the numerator and the 
denominator of the u-th convergent of a continued fraction —Y r—r 5 — 7 - * • • 

&1 + Dfc+ 03+ 04 + 

then / 


B v — b v B v -i ct v B v — 2 , 


(5.1) 


v > 2 . 



222 


LEO A. AROIAN 


As an example we calculate I s (2.5, 1.5), which could not be done by Muller’s 
continued fraction. 


Convergent 

A 

B 

A/B 

1 

1 

1 

1 

2 

1 

.42857143 

2.3333333 

3 

1 015873016 

.44444444 

2.2857142 

4 

.66233767 

.29292929 

2.2610838 

5 

.64812966 

28671329 

2.2605498 

6 

.46471308 

.20559441 

2.2603391 

7 

.441837914 

.195475117 

2.2603281 

8 

33105492 

.14646345 

2.2603245 

9 

.30890766 

.13660520 

2.2003242 

10 

.23762461 

.10512856 

2.2003240 

11 

.21882154 

.096809808 

2.2603240 


Using the value of the eleventh convergent we have, /. 6 (2.5, 1.5) = .28779339. 
Pearson [7], p 30, gives .2877934 and Soper [10], p. 32 gives .28779341. 

6. Discussion of the various methods. Muller's continued fraction encounters 
difficulties when q is small due to the possible divergence of the series on which 
it is based. In such cases the new continued fraction works admirably. Where 
“reduction by paits” [101 is advisable it would seem Muller's results will be 
better, while if “integration raising p” is preferable, then the new continued 
fraction would be necessary The other methods suggested in the past lacked 
in some cases remainder terms; were in other eases too long; were feasible only 
in a limited range; or were only approximations I am particularly indebted 
to Professor C. C. Craig under whose guidance this study was completed. 

REFERENCES 

[1] L, A,, Aroian, “A atudy of R A, Fisher's z Distribution ami the i elated F distribution,” 

unpublished manuscript, 

[2] Ii H Camp, “Probability integrals for the point binomial,” Biometrika, Vol 16 (1924), 

pp 163-171 

[3] A. A. Markoff, Wahrschcmlichkeilsrcchnung, translated from the second Russian 

edition by Heinrich Ltebmann. Leipzig: II, (!. Teubner, 1012. 

14] T. Muir, “New general formulae for the transformation of infinite senes into con¬ 
tinued fractions,’ 1 Roy. Soc Edinb. Tram., Vol. 27, p. 407. 

[5] J H MOleer, “On the application of continued fractions to ttie evaluation of certain 

integrals, with special reference to the incomplete Beta function," Biometrika, 

Vol. 22 (1930-31), pp. 284-297 

[6] K. Pearson (Editor), Tables for Statisticians and Bt a metricians, Part II, London! 

Biometric Laboratory, 1031, pp. ccxxv-ccxxvi. 

[7] K. Pearson, Tables of the Incomplete Beta-Function, London: Biometrika Office, 1934. 

[81 M V. and K. Pearson, "On the numerical evaluation of high order Eulerian integrals,” 

Biometrika, Vol 27 (1935), pp, 409-423 



CONTINUED ENACTIONS 


223 


[9] 0. Perron, Die Lehre von den Kettenbnichen, Leipzig: G. Teubner, 1913 (Pages 
refer to this edition ) 

[10] H, E. Soper, Tracts for Computers No. 7, The Numerical Evaluation of the Incomplete 

Beta-Function , London: Cambridge Umv. Press, 1921 

[11] J. V Uspensky, Introduction to Mathematical Probability, New York McGraw-Hill 

Book Co , 1937. 

[12] E. B. Van Vijsck, "On the convergence of algebraic continued fractions,” Am Math. 

Soc. Trans., Vol. 5 (1904), pp. 253-282 

[13] J. Wishart, "Determination of I cos n+I 6 do for large values of n, and its application 

Jo 

to the probability integral of symmetrical frequency curves,” Biometrika, Vol. 
17 (1925), pp 68-78. 

[14] J. Wishart, "Further remarks on a previous paper,” Biometrika, Vol 17 (1925), pp. 

489-471. 

[16] J Wishart, "On the approximate quadrature of certain, skew curves,” Biometrika , 
Vol. 19 (1927), pp. 1-38. 



NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items, 


NOTE ON THE DISTRIBUTION OF NON-CENTRAL t 
WITH AN APPLICATION 


By Cecil C, Craig 
University of Michigan 


If we adopt the notation recently used by N L. Johnson and B. L. Welch [1], 
non-central t is defined by 


t = 


z + 5 

~Vw ’ 


in which S is a constant and z and w aie independent variables, z being distributed 
normally about zero with unit variance and w being distributed as x/f in which / 
is the number of degiees of freedom for x' 2 - 
In the paper referred to Johnson and Welch discuss some applications of 
non-central t and give suitable tables calculated from the probability integral 
of the distribution of this variable. Previously tables of this probability in¬ 
tegral for the purpose of calculating the power of the t test had been given by 
J. Neyman [2] and Neyman and B. Tokarska [3]. 

It is the purpose of this note to call attention to a series expansion for the 
probability integial of non-central t which is simple in form and in most cases 
convenient for direct calculation. As an application of some intrinsic interest, 
this series is used to compute m several numerical cases the power of a test 
proposed by E J G. Pitman [4] based on the randomization principle. 

If for convenience we write, 

■\/vo = (0 < \[/ < oo), 

we have for the joint distribution of t + 5 and 

(1) ify + i, *) - ^7^) # At. 

From this 


(2) 


df{l, f) = 


2(f/2y ,2 e~ >2 ' 2 

VS?r(//2) 

2 (// 2 ) // 2 e“ 5 ’ /2 


V2t T(//2) 


(/+<*) n+w dt 

e - m+na 


224 



NON-CENTRAL t 


225 


Now this series can be integrated term by term with respect to \j/ over its range 
and we have, 


(3) 


m = 


v Tiiu+r + Di 

V~2 IT r(//2) r I 



This series converges unifoimly in any finite interval for t and it may bo inte¬ 
grated term by term over the entire range for t or over any part of it In 
particular, after some reduction, we get, 


(4) 


P(0 < t < to \f, 5) = f‘ C df(t) 
Jq 


-£ 2/2 


2 f=i T(r/2 + 1) 




l)’ 


in winch l( (r + l)/2, f/2 


to \ . 

’f + tl) 1 


is the incomplete Beta-function in the nota¬ 


tion of Karl Pearson. Often what is wanted is 


(5) P(~ k < t < Q = I ((r + l)/2, 

Since the incomplete Beta-function is numerically less than unity it is seen 
that the series (4) or.(5) converges rapidly for moderate values of S such as will 
ordinarily occur in applications for small samples. The use of Pearson’s tables 
of I(p, q; z) will be convenient “since interpolation will be required for only one 
of the three arguments 

As an application let us consider the test proposed by Pitman m the paper 
referred to above. Two independent samples, x L , x 2 , ■ ■ ■ , , and yi , y 2 , ■ ■ • , 

y jvj , have been drawn and it is desired in the absence of any information about 
the two populations from which the samples came to test the hypothesis that 
they have equal means. A test based on what may be termed the principle ol 
randomization for this situation has been discussed by R. A. Fisher [5] and by 
E, S Pearson [6] It is as follows: Let the combined sample of Ni + Ni ob¬ 
servations bo separated into sets of Ni observations, wi ,«*,••• , u Vl , and Ni 
observations, Vi , vz , ■ • ■ , vn t , in all possible ways. For each such separation 
let the numerical difference of the means, | u — v |, be the spread. Then for a 
suitably chosen S > 0, we will reject the hypothesis of equal means if fewer than 
100a% of the spreads exceed | x — y |, and otherwise not It is clear 

that this test is fiducially valid independently of the populations actually sampled 
in the sense that if it be consistently followed for all such samples, the proportion 
of cases when the hypothesis is rejected when it is true will statistically ap¬ 
proach a. 

For all but very small samples it is very tedious to calculate the h 1+ n 2 C Ki 



226 


CECIL C. CRAIG 


spreads and Pitman in his discussion showB that for quite moderate values of 
Ni and Ni the quantity, 


w = 


NiNi 

(N\ + Ni)' 


r\2 


(a — v) 


s(x - s) 2 + s(y - j?) 
Ni + N* 


+ _M* «’ + f 

^ (Ni + N,y K J 


has a distribution which in all but very exceptional cases is quite well approxi¬ 
mated by a B(£, %(Ni + Ni — 2))-function. That is, the distribution of w for 
the wt+jy-jCjv! spreads may for practical purposes be found from that of l , by a 
simple transformation, with Ni + Nt — 2 degrees of freedom. 

It seems pertinent to make some inquiry into the power of such a test, that is, 
to make an attempt to learn something about the probability that such a test 
will fail to reject the hypothesis of equal means when it is in fact false. To do 
this it is now necessary to specify the populations which have actually been 
sampled. If we suppose that these populations are normal with equal variances 
but with unequal means which, with no loss of generality, may be taken to be f± 
and — p respectively, the probability integral of the distribution of non-central 
t will give our answer. 

If we set 

t 2 _ r 2 

f+t 2 $ 2 + r 2 ' 

we have 

t = VJt/i 

Also, 

j _ (Ni — l)«i + (Ni — l)sj Ni Ni — 2 _ f 2 

4 Ni + Ni-2 ‘ Ni + Ni ~~ Ni + Ni 8 ’ 

in which s 2 is the usual estimate of the population variance <j~ based on / = 
N\ 4- Ni — 2 degrees of freedom. Then 

, _u -v , / NiNi 

e y m ^ 

and this is a central t if n = — n = 0, otherwise it is non-central. In the latter 
case we write (the test is made on £ — £?), 

, , (4 - # 1 ) - (V + p) + 2m . / NiN, 
s V Ni + Ni 

_ z + 6 



NON-CENTRAL t 


227 


in which, 


z = O 8 - m) - iv + p) 

a 


JJ-'l 
V Ni- 


Ni 

+ Ni ’ 


^ = s/o', 


and 


. 2 p / JhNj' 

a y m + m' 


In applying Pitman’s test for a given significance level a, one determines 
whether or not 

P{w > Wa) ^ a, 

Wo being the value of w calculated from the sample. This is equivalent to finding 

P(f > to), 


for the proper /, in which 


tl 

/ + to 


Wo 


and this can be found from an ordinary table of the probability integral of the 
{-distribution. 

For a numerical example let Ni — Nz = 10 so that/ = 18. If we adopt a 5% 
significance level we have tl = 2.101 2 for the critical value. Let us suppose that 
p/a — 0.1, and calculate the probability that the hypothesis that p = 0 will be 
rejected. We have 5 = 0.1 and 


Then 

Pit 2 < tl) 


tl 

f+tl 


0,1969. 


e ^[IiO.S, 9; 0.1969) + 0.11(1.5, 9; 0.1969) 


= 0.9292. 


+ °~p 1(2.5, 9; 0.1969) + ...] 


Four terms of the series were enough to give this result. The probability of 
rejecting the hypothesis in this case is thus 0.0708. 

The following tables show results for a = 0 05 and 0.01, p/a = 0.1, 0.2, and 
0.5, and Ni = N 2 = 10 and 20. 




228 


FREDERICK MOOTEtiLEll 


Values of P(f > tl) 
Ni = N, = 10 


\ u/<r 




\ 

(1.1 

0.2 

0.5 





0 05 

0.0708 

0.1355 

0.5021 

“ 

0.01 

0.0105 

0.0390 

0.2940 


Ni - N 2 = 20 


\ p/<r 


1 


\ 

a \ 

0.1 

0,2 

0 5 

0.05 

1 

0.0947 

0.2345 

j 

0.8091 

0.01 

0.0251 

1 

0.0802 

0.0730 


In only one case was it necessary to calculate as many as ten terms of the 
corresponding senes to obtain these, values, 


REFERENCES 

[1] N. L. Johnson and B L Welch, ‘'Applications of the non-contral /-distribution ," 
Biometrika, Vol 31 (1040), pp. 362-380 

[21 J Neyman, Statistical problems in agricultural experimentation,” Roy, Rial. Soc, 
Jour,, Supplement, Vol. 2 (1936), pp, 127-136. 

[3] J. Neyman and B. Tokakska, “Errors of the second kind in testing ‘Student’s’ hy¬ 

pothesis,” Amor Slat, -dsszi. Jour , Vol, 31 (1936), pp, 318-320. 

[4] E. J G. Pitman, "Significance tests which may be applied to samples from any popula¬ 

tions,” Roy Slat. Soc. Jour., Supplement, Vol. 4 (1937), pp. 119-130. 

[5] R. A Fisher, The Design of Experiments, Oliver and Boyd, Edinburgh, 1935, Section 21. 

[6] E. S Pearson, "Some aspects of the problem of randomization,” Biometrika, Vol. 29 

(1937), pp, 63-64. 


NOTE ON AN APPLICATION OP RUNS TO QUALITY CONTROL CHARTS 

BY Frederick Hosteller 
Princeton University 

In the application of statistical methods to quality control work, a customary 
procedure is to construct a control chart with control limits spaced about the 
mean such that under conditions of statistical control, or random sampling, the 
probability of an observation falling outside these limits is a given a (e.g., .05). 
The occurrence of a point outside these limits is taken as an indication of the 
presence of assignable causes of variation in the production line. Such a form 



AN APPLICATION OP IUJNS 


229 


of chart has boon found to bo of particular value in the detection of the presence 
of assignable causes of variability in the quality of manufactured product. As 
recently pointed out, however, the. statistician may not only help to dot-cd the 
presence of assignable causes, but also help to discover the causes themselves in 
the course of further reseuich and development. For this purpose, runs of 
different kinds and of different lengths have been found useful by industrial 
statisticians. 1 Quality control engineers have found, at least in research and 
development work, that a convenient indication of lack of control is the occur¬ 
rence of long runs of observations whose values lie above or below that of the 
median of the sample. For example (as will be shown below), at least one suc¬ 
cession of 9 or more observations above or below the median m a sample of 10 
would be taken as evidence of lack of control at the .05 level; meaning that 
under conditions of control such a run would occur in approximately 5 per cent 
of the samples. Since this type of test has been found useful by quality control 
engineers, it is perhaps desirable to discuss the mathematical basis of such tests 
of control and provide a brief table for samples of various sizes at the signifi¬ 
cance levels .05 and .01. 

The general distribution theory of runs of k kinds of elements, and in particular 
that of two kinds has been thoroughly investigated by A. M. Mood 2 3 The 
purpose of this note is to give an application of the. general method to quality 
control. 

Let us consider a sample of size 2n drawn from a continuous distribution 
function f(x ). These are then arranged in the order in which they were drawn. 
We now separate the sample into two sets by considering the nth and (n + l)st 
elements in order of magnitude, then if x t g x „, x, will be called an a, and if 
Xi i n+1 , Xi will be called a b. A run of a's w'ill be defined as usual as a suc¬ 
cession of a’s terminated at each end by the occurrence of a b (with the obvious 
exceptions where the run includes the first or last element of the sample.), and 


1 The use of ‘Tuna up” and 'Tuna down” aa well as runs above and below the arithmetic 
mean of a sample were briefly described in a papei by W A. Shewhart, "Contribution of 
statistics to the science of engineering,” before the Bicentennial Celebration of the Uni¬ 
versity of Pennsylvania, September 17, 1940, to be published in the proceedings of that 
meeting. In a papei, “Mathematical statistics in mass production," presented before the 
American Mathematical Society in February, 1941, Shewhart discussed some of the ad¬ 
vantages of using runs above and below the median and showed how by comparing runs of 
different typos in a given problem it i3 often possible to fix rather definitely the source of 
trouble The present note considers only the fioquenoy of occurrence of “long” runs which 

arc often used by research and development engineers to indicate the pioscnce of assignable 
causes of variation. The occurrence of more than one such run in a given sequence, if dis¬ 
tributed above and below the median value may also constitute valid evidence of the 
presence of more than one state of statistical control between which the phenomena may 
oscillate The interpretation of long runs in this sense, however, is not considered in the 
present note 

3 A. M. Mood, “The distribution theory of runs,” Annals of Math Slat,, Vol 11 (1940), 
pp 367-392 



230 


FREDERICK MOSTEtLEH 


runs of b'fi are defined similarly. A run of a’s may conveniently be called a 
run "below the median,” and a run of b’s a run "above the median.” 

We shall use Mood’s notation thioughout, i.e., r u , ?- 2( , (z = 1,2,•••,«) are 
the number of runs of a’s and b’s respectively of length i, and , r% are the total 

number of runs of a ’s and fi’s; m will indicate a multinomial coefficient, and 

L m *J 

a binomial coefficient. Also we define 

F(n, r a ) = 0, | n - n \ > 1, 

F(n, n) = 1, | ri — 1 — 1, 

F(r 1 ,T t )~ 2, | r 3 - r 2 | = 0. 

Then the distribution of runs of a’s for our cose is 


P(ru) = 


_£KT) 

(?) 


We would like to find the probability of at least one run of s or more a’s. The 
coefficient of i n in 

(2) t* + ** + • - - + 

gives the number of ways of partitioning n elements into n partitions such that 
no partition contains s or more elements, and none is void. Rewriting (2) we 
have 




and the coefficient of x n is just 




Then the probability that we desire, of getting at least one run of s or more a’s 
is immediately given by 

P(.Tu s~ 1, i ^ a) 




AN APPLICATION OF BUNS 


231 


Noting that when j = 0 in the inner summation we have just the total number 
of partitions, we get finally 


x yu c- A, * a, — —- . —---■ 

C;) 

A similar result of course holds for the b’s. 

If we desire the probability of getting at least one run of s or more of either 
a’s or b’s, we compute the probability of getting no runs of this type and sub¬ 
tract from unity. Expression (3) multiplied by the total number of ways of 
getting no partitions of s or more b’s for a given ?•!, and then summed on n 
gives exactly the number of ways of getting no runs of either a’s or b’s as great 
as s. This is 


I'hj/ti - 1 - j(s - 1) 


t A\ 


nr*. 




X (* + ’) X (-!)•« 

ri-1 \ r l / 1-1 


(5) 


A = 





and the probability desired is 

(6) P(ru ^ 1 or r 2i Sr 1 or both; i ^ a) = 1 — A 

In spite of the complex appearance of A, the sum can be rapidly calculated for 
any given s, n since the calculations for the sums on i and j need not be duplicated. 

In the case of a quality control chart, we set a significance level a for a given n, 
this determines s the length of run of either type necessary for significance at the 
level chosen. Suppose we are interested only in runs occurring on one side of 
the median, say above, when a = .05, n = 20 (i.e., sample size equal to 40). 
We determine the least value of s which will make the right hand side of equa¬ 
tion (4) less than or equal to .05. It turns out that s = 8 for this case. This 
means that under conditions of statistical control, i.e , random sampling, one or 
more runs of length 8 or more, above the median will occur in approximately 
5 per cent of samples of size 40. Naturally an identical result holds when we 
are considering only runs below the median. 

On the other hand, if under the same conditions as given above (n = 20, 
a = .05), wc are using as our criterion of statistical control the occurrence of 
runs of length s or greater either above or below the median, we must determine 

the least value of s such that 1 — A/ Or) < .05. This value turns out to be 9. 

In other words under conditions of statistical control at least one run of at least 9 
will occur either above or below the median in less than 5 per cent of the cases 
on the average. 




232 


FREDERICK MOBTELLEIt 


The following table gives smallest lengths of rims for .05 and .01 significance 
levels for samples of size 10, 20, 30, 40, 50. 



Runs on one side of median 

Runs on either side of median 

2 n. 

a - .06 

a “ .01 

« - .05 

a - .01 

10 

5 

— 

5 

— 

20 

7 

8 

7 

8 


8 

9 

8 

9 

40 

8 

9 

9 

10 

50 

8 

10 

10 

11 


If there is an odd number of individuals, say 2n -f- 1, in the sample, we would 
choose the value of the median as the dividing line for our sample and treat the 
data as if there were only 2 n cases, thus ignoring the median completely. 

The following table 8 gives the probabilities of getting at least one run of b 
or more on one side, either side, and each side of the median for samples of size 10, 


20, and 40. 









Length 

of 

Run (s) 

One 

Side 

2n = 10 
Either 

Side 

Each 

Side 

One 

Side 

2n — 20 
Either 
Side 

Each 

Side 

One 

Sido 

2n w 40 
Either 
Sido 

Each 

Side 

1 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

2 

.976 

.992 

.960 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

3 

.500 

.667 

.333 

.870 

.956 

.784 

.992 

.999 

.986 

4 

.143 

.230 

.056 

.457 

.640 

.274 

.799 

.930 

.668 

5 

.024 

.040 

.008 

.178 

.293 

.064 

.450 

.650 

249 

6 




.060 

.106 

.013 

.207 

.346 

.068 

7 




.017 

.032 

,002 

.087 

.158 

.016 

8 




.004 

.007 

,000 

.034 

.065 

.004 

9 




.001 

.001 

.000 

.013 

.025 

.001 

10 




.000 

.000 

.000 

.005 

.009 

.000 

11 







,002 

.003 

.000 

12 







.000 

.001 

.000 

13 







.000 

.000 

.000 


One method of computing such a table is to use expression (4) tp obtain the 
probabilities on one side, and to use (0) to got probabilities for either side. 
Then the probabilities for runs on each side may be computed by using the 
relationship 

2 P (one side) — P (either side) = P (each side). 

•The author is indebted to Dr, P, S. Olmstead of the Bell Telephone Laboratories 
for kindly plaoing this table at his disposal. Dr. Olmstead has pointed out that these 
probabilities have been found very useful in research and development work. 











TEST OF HOMOGENEITY 


233 


TEST OF HOMOGENEITY FOR NORMAL POPULATIONS 


By G. A. Baker 


University of California 

1. Introduction. In biological experiments it is often of interest to test 
whether or not all the subjects can be regarded as coming from the same normal 
population. If they have not come from the same normal population, usually 
the most plausible alternative is that the subjects have come from a population 
which is the combination of two or more normal populations combined in some 
proportions. The combination of normal populations is a "smooth” alternative 
to the hypothesis of a single normal population. Such non-homogeneous popu¬ 
lations are not the only "smooth” alternatives, of course, but are included 
among the "smooth” alternatives. If there is reason to believe that the only 
deviation from a normal population is due to non-homogeneity, then the results 
of Professor Neyman in his paper [1] are available in studying this problem. 

It is desirable not to make any hypotheses about the mean and standard 
deviation of the sampled population, but to base all computations and tests on 
the data contained in the sample. Such a viewpoint has been stressed in a 
previous paper [2] where it was shown that if the sampling is from a normal 
population, the probability of a deviation from the mean of a first sample of n 
measured in terms of the standard deviation of the sample is proportional to 


( 1 . 1 ) 



dv 

v 3 y ' 

n-1 -1/ 


The result (1.1) and Neyman’s results give rise to a test of homogeneity whioh 
is valid for "large” samples. Empirical results show that fairly conclusive evi¬ 
dence of non-homogeneity may be obtained with samples of 100. Samples of 50 
or less may be. suggestive but rarely decisive. 


2. Development of Test. Suppose that a sample of n ■+■ 1 is drawn from a 
normal population. It can be regarded as being made up of a first sample of n 
and a second sample of one. The value of v corresponding to (1.1) can then be 
computed and its distribution function is (1.1). This partition, of course, can 
be made in n + 1 ways. That is, n + 1 values of v are determined from a 
random sample of n + 1 from the original parent. It is true that these values 
of v are not independent among themselves. The correlation between the values 
of v, to a first approximation at least, is of the order of 1/n and can be neglected 
if nis “large.” 

A suitable transformation as discussed in [3], [1] and elsewhere, transforms 
(1.1) into a rectangular distribution. 

If the same computations are made when the sampled population is not 



234 . 


G. A. BAKER 


normal, then the resulting values obtained will not be rectangularly distributed. 
For instance, suppose that the sampled population is 


( 2 . 1 ) 


/(*) = 


vV2t 




we find that the distribution of v based on the first sample of 2 is a very com¬ 
plicated expression involving sums of exponentials and definite integrals of expo¬ 
nentials. To obtain a rectangular distribution if the sampled population is 
normal, the appropriate transformation to make is 

v = — x/3 cot ttu 

(2.2) , /o » , 

dv — V3 ir esc iru du. 

The resulting u-distribution for population (2.1) then is to be compared with 
the rectangular distribution in the interval from zero to one. 

For “large” values of n + 1 and for symmetrical non-homogeneous popula¬ 
tions composed of two normal components, the u-distribution will be sym¬ 
metrical about u — less than one near the ends, greater than one for values 
of u moderately far from i and less than one for values of u near j. A Neyman 
[1] of order 4 will be necessary to detect a difference of this sort. If the 
non-homogeneous population of two components is skewed, the -u-distribution 
will still show the same two-humped effect but may be skewed instead of sym¬ 
metrical. A Neyman of order 4 should still bo computed, although 'F? may 
be more significant. 

The test then consists of: 

(a) computing the n + 1 quantities 


<** («•- 1 , 2 , 3 , 

where 


n + 1 = number in the sample 
x, = the observed values 


z, = the observed values except z ( 




71 ftm\ 


*i, 8* = £ (X } - S)* 

n /-i 


(b) making the transformation 

* 

Ui ~ _p x'l'jnfi 1 ( l = 1) 2, 3, • • - , n + 1) 

(c) computing the first four St^’s of Neyman’s paper [1] 

(d) comparing Ta, with ^J(fc) as found from the Incomplete Gamma Function 
Tables. 



TEST OF HOMOGENEITY 


235 


If ft is large, say n = 100, then u is given approximately by the normal 
probability integral. 

If ft is small, the values of u are obtained from the Table 25 of Vol, 2 of 
Pearson’s Tables. 

Neyman’s derivation assumes that ft + 1 is large and that the u’s are inde¬ 
pendent In this case, if n T- 1 is large, then the u’s are nearly independent, 
and hence the test is valid. The same procedure can be applied for smaller 
samples It can not be expected that small differences from normal in the 
sampled population can be detected with small samples. Empirical results 
indicate that samples of 100 are necessary for decisive results even when the 
differences of the sampled population from a normal homogeneous population 
are large Samples of 50 may be suggestive and in very extreme eases might be 
decisive. 


TABLE I 

Empirical Sampling Results 



1! 

*-*• 

fc = 2 

k =» 3 

k = 4 

4>2’h for 51 from population A,,. . 

.0001 

.843 

2.009 

7.464 

'kfc’s for 101 from population A. 

.086 

2.403 

4.998 

12.868 

'I'fc’s for 101 from population B . 

.553 

.927 

7 472 

7.485 

iEb’s for 101 from normal 

.017 

.082 

1 288 

1.663 

(&)’s (Neyman [1]) . 

3.842 

5.992 

7.815 

9.488 

^(•onW's (Neyman [1]) . 

6.635 

9.210 

11.345 

13.277 


It is to be noted that the test makes no assumption about the parameters of 
the sampled population and does not group the data. The application of the 
test gives a unique result that does not depend on the judgment of the computer 
in any respect. In applying the usual chi-square test the computer must choose 
groupings The choice of groupings as indicated in [5] may change the P-values 
to very different levels of significance. 

3, Empirical results. Samples of 51 and 101 from population A, of 101 from 
population B, and of 101 from a normal population, were drawn by throwing 
dice. Populations A and B are given in [4]. Population A is symmetrical and 
distinctly bimodal. Population B is weakly bimodal and strongly skewed. 

For samples from population A it is necessary to compute Tij , For samples 
from population B it may be sufficient to compute ^ 3 . The non-homogeneity 
of the type of population A seems to be somewhat more detectable than of the 
type of population B. The sample from the normal parent shows close con¬ 
formity with expectation. 

In applying the proposed test for homogeneity the u-values for small inde¬ 
pendent sets of data can be combined to give a much larger number of u-values. 









236 


W, MAC STEWART 


REFERENCES 

[1] J. Neyman, “((Smooth Teats for goodness of fit,” Skanthnavisk Akluaricdskrift, (1937), 

pp. 149—199. 

[2] G A Baker, “The probability that the mean of a second sample will differ from the 

mean of a first sample by less than a certain multiple of the standard deviation 
of the fiist sample," Annals of Math. Stal., Vol. fl (1935), pp. 197-201 

[3] G. A BAKEn, "Transformations rtf bimodal distributions," Annals of Hath, if tat,, 

Vol. 1 (1930), pp 334-344. 

[4] G, A. Baker, "The relation between the means and variances, means squared and 

variance in samples from the combinations of normal populations," Annals of 
Math Stal., Vol 2 (1931), pp. 333-354, 

[6] G A. Baker, "The significance of the product-moment coefficient of correlation with 
special reference to the character of the marginal distributions," Jour. Am. 
Stal. Assoc , Vol 25 (1930), pp. 387-396. 


A NOTE ON THE POWER OF THE SIGN TEST 

By W. Mac Stewart 
University of Wisconsin 

1. Introduction. Let us consider a set of N non-zero differences, of which x 
are positive and N — x are negative; and suppose that the hypothesis tested, 
Ho, implies, in independent sampling, that x will be distributed about an ex¬ 
pected value of N/2 in accordance with the binomial (a + %) N . As a quick 
test of H 0 , we may choose to test the hypothesis ho that x has the above proba¬ 
bility distribution. Defining r to be the smaller of x and N — x, the teat con¬ 
sists in rejecting ho and therefore Ho whenever r < r(t, N), where r(«, N) is 
determined by N and the significance level e. 

2. Power of a test. In applying such a test it is of interest to know how 
frequently it will lead to a rejection of Ho when Ho is false and the situation H 
implies that the probability law of x is (q + p) y , with p ^ thereby indicating 
an expectation of an unequal number of + and — differences. The proba¬ 
bility of rejecting H 0 when Hi implying p = pi is true, is termed the power of 
the test of IJo relative to the alternative Hi } Thus, from the point of view of 
experimental design the power (P) of the test of Ho may be considered a func¬ 
tion of the alternative hypothesis Hi, the significance level t, and N. As such, 
the following observations may be noted; 

1. The power P 2 , for an assumed e, N, and Hi implying p = p 2 is greater 
than or equal to the power Pi for t, N and H, implying p = p 3 where 
| Pi — .50 | > | pi — .50 |. 


1 For an extensive discussion of the power of a test, the reader is referred to J. Ney¬ 
man and E. S. Pearson, Statistical Research Memoirs, Vol. 1 (1936), pp. 3-6. 



POWER OF THE SIGN TEST 


237 


2. The power P 2 for an assumed Hi , N, and ei , is greater than or equal to the 
power Pi for Hi, N, and et, where «2 > «i. 

3. The power P 2 for an assumed Hi , t, and Ni is greater than or equal to the 
power Pj for Hi , «, and N\ where N 2 > Ni. 

Hence, to increase the power of the test of H 0 relative to a particular Hi, 
the methods implied in observations 2 and/or 3 may be employed. However, 
if any increase in an established e is undesirable, the method implied in observa¬ 
tion 3 is the alternative. 

3. Explanation of table. In the interests of efficiency and economy, two 

questions then arise: (1) What is the minimum value of IV, which, at the signifi¬ 
cance level «, will give the test of H<> a power P > 8, relative to a particular 
alternative hypothesis Hi ? (2) For this minimum value of N corresponding 

to e, what is the maximum value of rl Stated in another manner, the questions 
are these: “What is the smallest number (min N ) of paired samples that must 
be employed in conjunction with the Sign Test in order that the test of H 0 , 
at the significance level e, shall have a power P > /9 relative to an alternative 
hypothesis Hi ?" (2) If x of these paired samples give rise to a positive differ¬ 

ence, and (min N — x) a negative difference, and if r be defined as the smaller 
of x and (min N — x)] then, what is the maximum value that r may attain and 
still have the results, at the level e, judged significant? 

Table I provides the answers to these questions for the significance level 
e .05; and (1) for Hi implies p = pi for values of pi from .60 to .95 (and 
thereby from .40 to .05) at intervals of .05; (2) for values of £ from .05 to .95 
at intervals of .05, and also for /9 > .99. For example, assume that a power 
P > .80 relative to the alternative hypothesis Ha (Pi = .70) is desired. In 
Table I, the entry appearing in the column headed H s (p i = .70), and in the 
row P > .80 is 49,17—indicating that 49 paired samples are required, of which 
17 or less must be of one sign (-j- or —) and hence 32 or more must be of the 
opposite sign in order that the results be significant at the .05 level. 

Because of the discreteness of the binomial distribution, it is impossible to 
maintain the level of significance at .05 or even arbitrarily close to that figure 
and still hold to the criterion that N shall be at a minimum. For that reason, 
particularly when min N is small, results significant at .05 according to Table I 
may be significant at a level e' where e' is considerably less than .05. In general, 
however, and in particular when min N is large (greater than 50) both the 
quantities (.05 — «') and (P — /3) are small. 

4. Illustrative example. Goulden 2 describes a simple experiment in identi¬ 
fying varieties of wheat. In this experiment, a wheat “expert” is presented 
paired grain samples of two particular varieties of wheat. The object of the 


1 C. H. Goulden, Methods of Statistical Analysts, John Wiley and Sons, New York, 1939, 



238 


W. MAC STEWART 


.experiment is to test the ability of the expert to differentiate between the two 
varieties by arranging the pairs so that samples of one variety are on the left, 
say, and samples of the other variety are on the right. 

In a problem of this type, it is desirable to have a sufficiently large number, N, 
of paired samples in order that the following conditions be fulfilled: (1) The 
probability that a person possessing no discriminating ability pass the test 

TABLE I 

Minimum number of paired samples and maximum values of related r 

H 0 ~ p 0 = .50 


(5% level of significance, i.e., e < .05) 
(min N, max r ) 


Power 


H s 

■pi-.95 

H 7 

pi*">.90 

H t 

pi*=.85 

Pi” 80 

Hi 

p i=* .75 

Hi 

Pi“,70 

Hi 

pi *=.66 

Hi 

pi *=.60 

0 < P < 

.05 

— 

— 

— 

— 

— 

— 

7,0 

6,0 

P > 

.05 

— 

— 

— 

— 

— 

7,0 

6,0 

9,1 

P > 

.10 

— 

— 

— 

—• 

7,0 

6,0 

9,1 

17,4 

P > 

.15 

— 

— 

— 

8,0 

6,0 

9,1 

12,2 

25,7 

P > 

.20 

— 

— 

— 

7,0 

10,1 

13,2 

17,4 

37,12 

P > 

.25 

— 

— 

8,0 

6,0 

14,2 

12,2 

23,6 

44,15 

P > 

,30 

— 

— 

7,0 

11,1 

9,1 

18,4 

25,7 

56,20 

P > 

.35 

— 

— 

6,0 

10,1 

12,2 

17,4 

30,9 

65,24 

P > 

.40 

— 

8,0 

— 

9,1 

16,3 

20,5 

35,11 

74,28 

P > 

.45 

— 

7,0 

11,1 

— 

15,3 

26,7 

42,14 

89,35 

P > 

.50 

— 

6,0 

10,1 

13,2 

18,4 

25,7 

44,15 

101,40 

P > 

.55 

— 

— 

9,1 

12,2 

17,4 

30,9 

51,18 

112,45 

P > 

.60 

— 

— 

14,2 

15,3 

20,5 

36,11 

56,20 

125,51 

P > 

.65 

7,0 

11,1 

13,2 

19,4 

23,6 

35,11 

63,23 

143,59 

P > 

.70 

6,0 

10,1 

12,2 

18,4 

25,7 

40,13 

67,25 

158,66 

P > 

.75 

— 

9,1 

16,3 

17,4 

28,8 

44,15 

79,30 

175,74 

P > 

.80 

— 

14,2 

15,3 

20,5 

30,9 

49,17 

90,35 

199,85 

P > 

.85 

11,1 

12,2 

18,4 

25,7 

35,11 

56,20 

101,40 

227,98 

P > 

.00 

9,1 

15,3 

17,4 

28,8 

42,14 

65,24 

114,46 

263,115 

P > 

.95 

12,2 

17,4 

23,6 

35,11 

49,17 

79,30 

143,59 

327,145 

P > 

.99 

15,3 

23,6 

30,9 

44,15 

67,25 

110,44 

199,85 

453,205 


through sheer guesswork be less than «; and (2) if past experience has proven 
that an expert does possess the ability to discriminate between the varieties to 
the extent of placing a proportion, pi, of the pairs correctly in the long run, 
then the probability that he will pass the test be P. 

Under these conditions, how large an N is required, and for that N, what is 
the maximum number of pairs that may be incorrectly placed without failing 



MOMENTS OF A RATIO 


239 


the test? For alternative hypothesis IU (/h — .75), and for P > .90, referring 
to Table I, it is seen that 42 paired samples must be employed and not more 
than 14 may be placed incoriectly. Under the same alternative hypothesis, if 
it be required merely that P > .50 (i.o., an expert with an ability of .75 have 
better than an even chance of passing), then only 18 paired samples are necessary 
and nob more than 4 may be arranged incorrectly. 

Thus, before conducting an experiment in which the Sign Test is to be em¬ 
ployed, if the experimenter flint decides what power the test must have relative 
to a certain alternative hypothesis; then from the accompanying table he may 
learn the minimum number of paired samples that are. necessary; and the related 
maximum value of r. 

If this procedure is not followed, and an experimenter employs, say 0 paired 
samples, he may (as can be seen from the tahle) discover, to his dismay, that 
“experts” of ability .75 will be unrecognized more than 80% of the time. 


MOMENTS OF THE RATIO OF THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE TO THE MEAN SQUARE DIFFERENCE IN 
SAMPLES FROM A NORMAL UNIVERSE 

By J. D. Williams 
Phoenix , Arizona 

The following result may have considerable application to trend analysis. 
The specific problem was proposed to me by R. H. Kent. 

Consider a sample 0, t : Xi, Xi, ■ ■ ■ , X n from a normal population with zero 
mean and variance <r z , the variates being arranged in temporal order. We seek 
the moments of the ratio of 6 2 to S 2 , where 

(1) (n - 1)5 Z = £ (X, - X, +J ) 3 

7”1 

and 

( 2 ) nS‘ = £ (X, - Xf. 

Here X is the mean of the X,. In order to simplify the algebra, we will work 
with quantities A and B defined by 

2<r 3 A = (n - l)$ s , 

(3) 

2 SB = nS\ 

The characteristic function for the joint distribution of A and B is 
v(ti, U) = E(e All+Btt ) 

to 

- (vk)‘ x 0 5 iX <- 


(4) 



240 


J. D. WILLIAMS 


where ti and U. are pure imaginaries, For the method of analysis which will be 
used here h and h will be considered as real variables. By straight forward 
methods we have 


(5) 


<P \h,h) = 


a b d 

b c b d 

d b c b d 


d 


d b c b d 

d b c b 

. d b a 

where the determinant is of nth order and its elements are 

a = 1 — <i — (» — 1 )T 

b = k 4- T 

c = 1 — 2<i — (n - 1 )T 
d = T = ii/n. 

It can be verified that the determinant has the value 


( 6 ) 


(7) 


f-di.o - £ ( 2n 7- 1 ) (-«'(! 


h) 


n-f -1 


i 

. J represents a binomial coefficient. From (7) we 
find the moments m, of A/B as follows: Setting 

<3 — XI Uk, 


( 8 ) 

we have 

(9) 


*-i 


mi 


-If-I 


d J <p(h, li) 


dt{ 

2 ‘\m ^ °>]„- 


6 dtu 

il-0 k~l 


(n - I)(n + 1) • • • (n + 2j - 3) ‘ 


The result is rather unexpected, for we have established that the moments of 
A/B are equal to the moments of A divided by the moments of B. 





MOMENTS OF A 11ATIC) 


241 


We find the following explicit values for the first few moments; m, : 
mo = 1 


THi — 2 

(10) (n — 1)(» -f~ 1 )rtt a = 4 (n 1 + a — 3) 

(n — l)(n + l)(n + 3 )mj =* 8(?i s -f- fin 2 + 2n — 21) 

(n - l)(n + l)(n + 3)(n + 5 )m t » l(5(n< + 14n* + 53n s - 8n ~ 231). 


These are valid subject to the restriction 2n — 1 > j, because in arriving at the 


explicit forms we have treated the binomial coefficient 


©-«' 


it were iden¬ 


tically equal to k(k — 1) ■ ■ ■ (k — j + l)/j!. 

From (10) it is easy to pass to the moments of R — 5 i /S i . For example, we 


find the mean value and variance of R to be 



and 

4n 2 (n — 2) 

(n + 1) (n — l) 2 

respectively. 




THE ANNALS 
of 


MATHEMATICAL 

STATISTICS 


(FOtttWKu bt a. c. cAimm) 

The Official Journal of this Institute 
of Mathematical Statistics 



Contents 



fib 


J 


On the Integral Equation of Ren«wal Theory. Wii.lv FKLLu.it.. . 248 

On tine Joint Distribution of the Medians in Samples from a Mul¬ 
tivariate Population. A. M. Moon...3®* 

Samples from Two Bivariate Normal Populations. (Jmi«o Tfu 
Hbo...... 2711 


On Randomness in Ordered Sequences. L. O, Young.293 

On Certain Likelihood-Ratio Testis Associated with the Exponen¬ 
tial Distribution. Bdwaku Paulson ....301 

On the Mathematically Significant Figures in the Solution of 
Simultaneous Linear Equations. L. B. Tuckekmam. ........ 307 

On Mechanical Tabulation of Pdlynomials. J. C. MoPhekbow. .. 317 

On the Probability of the Occurrence of at Least m Events Among n 
Arbitrary Events. Kai Lax Chung .. 3® 


Notos * 

A Note on Sheppard’s Correction. Cecil G. Cure...... 

On the Analysis of Variance in Case of Multiple Glassifications with 

Unequal Class Frequencies. Abeaham Wald.. — 

The Frequoney Distribution of A General Matching Problem. T.N. E. 

Ghhvillb........ 

On Methods of Solving Normal Equations. Paul G. Horn*.. 

Conditions that the Boots of a Polynomial be Lees Than Unity in 

Absolute Value. Path, A. SAitmaiaow.. 

Values of Mills' Ratio of Area to Bounding Ordinate and of the Nor¬ 
mal Probability Integral for Large Values of the Argument. 
Romuttr D. Gobdon...-.. 


339 

34© 

350 

354 

360 

364 


Vol. XII, No. 9 — September, 1941 















THE ANNALS 

OK mathematical STATISTICS 

efttTfc.fr by 

' a 8. WELKB, Mitor 

; a. % amc’s ■ j, nky&uw ' 

1 ' U.' ( 

' ’ ’ ■' ■ # ■ 

■' It, Ct CsAnywt_ ; It A. FMttsit , if. v*«* Umm 

";:h . /'>''•'&Quart* . ■ - - T. C. ¥ar Ktk'Pmmim 

", r ; ' W. fc. Desung - ,|i HowuHa II. U Dim - ' 

, , 0- I>A«Mom 1 W. A, Bhbwiiaht 

Tfao A**9f als of. SrAtiwus fs quarterly by the 

••-iMStitote MfdftttqtfM jptfMfcNK Mb ttoynl ft Onitford Ave*., IMtiniom, 
*w»wwKoWtew for fostek numbers and afcher foiiatHcss «om- 
should IttfMntf. to' Hit Annals of Mathematical Statistics, Ml 
v Ss OuHford Av<^yJ*idtirod«*, M4-» «r to ton Seetutory of {foe Insti- 

\v‘ 1^. G: Okis t OuiMttfe fiwtiiuto of TitehnolaRy, 

_ Ghttttqjf&fa teniimg ftcbtnpf* wftfch nn.tn 1 «tow oftniMrofor 
V* - ^■'•^pwrtedl to'to« SotTetoTy on or before {be tfrifoaf tbs' 

"li'. tin* inotttli oHfoat iftStfts, T fM-*tao«!tth» oC sure' MwK; 

Otwt IWcftrtfftsr, * 

’! '•■ ' ■ 

a ", ■< 



'/• *'<& 

, r«», ,: -v 

Iv* $*f«w 

■'' <J£ 

'd%£w- 

Mnu?.\ a 


It'.' obftmj-.wdi diagrams should too drawn oft 

.. ~j , - ;w.. . .*ato« eloto j» blank India iftk twtoo{foesiase they ate to . J 

Wp ii* mind tyj*»graphien] tKffiet|Ifii» •:■'■ 

frhoated jnathemit.I.i.'Al rnonolnn -\ ' '■ * . i ;• i - . v. V 




ON THE INTEGRAL EQUATION OF RENEWAL THEORY 

By Willy Feller 
Brow University 

1. Introduction. In this paper we consider the behavior of the solutions of 
the integral equation 

(1.1) u(t) - g(t ) + f u(t — x)f(x) dx, 

where /(f) and g(t) are given non-negative functions. 1 This equation appears, 
under different forms, in population theory, the theory of industrial replacement 
and in the general theory of self-renewing aggregates, and a great number of 
papers have been written on the subject. 2 Unfortunately most of this literature 
is of a heunstic nature so that the precise conditions for the validity of different 
methods or statements are seldom known This literature is, moreover, abun¬ 
dant in controversies and different conjectures which are sometimes supported 
or disproved by unnecessarily complicated examples. All this renders an ori¬ 
entation exceedingly difficult, and it may therefore be of interest to give a 
rigorous presentation of the theory. It will be seen that some of the previously 
announced results need modifications to become correct. 

The existence of a solution u(t ) of (1.1) could be deduced directly from a well- 
known result of Paley and Wiener [21] on general integral equations of form 

(1.1) . 3 However, the case of non-negative functions /(£) and g(t), with which 
we are here concerned, is much too simple to justify the deep methods used by 
Paley and Wiener in the general case. Under the present conditions, the exist¬ 
ence of a solution can be proved in a simple way using properties of completely 
monotone functions, and this method has also the distinct advantage of showing 
some properties of the solutions, which otherwise would have to be proved 
separately. It will be seen in section 3 that the existence proof becomes most 
natural if equation (1.1) 'is slightly generalized. Introducing the summatory 
functions 

(1.2) U(t) - [ u(x)dx, F(t) = f f(x)d%, 0(t) = f g(x)dx, 

Jo Jo Jo 

1 For the interpretation of the equation cf section 2, 

J Lotka's paper [8] contains a bibliography of 74 papers on our subject published before 
1939 Yet it is stated that even this list "is not the result of an exhaustive search." At 
the end of the present paper the reader will find a list of 16 papers on (1.1) which have 
appeared during the two years since the publication of Lotka’s paper. 

* This has been remarked also by Hadwiger [3]. 

243 



244 


WILLY FELLER 


equation (1 1) can be rewritten in the form 

(1.3) U(t) = G(t) + f U(t - x) dF{x). 

Jo 


However, (1.3) has a meaning even if F(t) and G(t) are not integrals, provided 
Fit) is of bounded total variation and the integral is interpreted as a Stieltjes 
integral. Now for many practical applications (and even for numerical calcula¬ 
tions) this generalized form of the integral equation seems to be the most 
appropriate one and, as a matter of fact, it has sometimes been used in a more or 
less hidden form (e.g., if all individuals of the parent population are of the same 
age). Our existence theorem refers to this generalized equation. 

We then turn to one of the main problems of the theory, namely the asymptotic 
behavior of u(l) as t —*■ ». It is generally supposed that the solution u{t) 
“in general” either behaves like an exponential function, or that it approaches 


in an oscillating manner a finite limit q) the latter case should arise if / J{i) dl — 1, 

Jo 

thus in particular in the cases of a stable population and of industrial replace¬ 
ment. However, special examples have been constructed to show that this is 
not always so. 4 In order to simplify the problem and to get more general condi¬ 
tions, we shall first (section 4) consider only the question of convergence in mean, 
that is to say, we shall study the asymptotic behavior not of u{l) itself but of 

1 f 1 

the mean value u*(t) — - / u(x) dx. The question can be solved completely 

l JQ 

using only the simplest Tauberian theorems for Laplace integrals. Of course, 
if u{t) q then also u*{t) —> q, but not conversely. The investigation of the 
precise asymptotic behavior of u{t) is more delicate and requires more refined 
tools (section 5). 

Most of section 6 is devoted to a study of Lotka’s well-known method of 
expanding u(t) into a series of oscillatory components, and it is hoped that this 
study will help clarify the true nature of this expansion. It will be seen that 
Lotka’s method can be justified (with some necessary modifications) even in 
some cases for which it was not intended, e.g., if the characteristic equation has 
multiple or negative real roots, or if it has only a finite number of roots. On 
the other hand limitations of the method will also become apparent; thus it 
can occur in special cases that a formal application of the method will lead to a 
function u(t) which apparently solves the given equation, whereas in reality it 
is the solution of quite a different equation. 

Of course, most of the difficulties mentioned above arise only when the func¬ 
tion f{t) has an infinite tail. However, it is known that even computational 
considerations sometimes require the use of such curves, and, as matter of fact, 


4 Cf. Hadwiger [2] and alBO Hadwiger, “Zur Berechnung der Erneuerungsfunktion nach 
einer Formel von V. A Koatitisin,” Mitt Verem schweizerischer Versich.-Malh., Vo], 34 
(1937), pp 37-43. 



RENEWAL THEORY 


245 


exponential and Pearsonmn curves have been used most frequently in connec¬ 
tion with (1.1). It will be seen that even in these special cases customary 
methods may lead to incorrect results. Besides, our considerations show how 
much the solution u(t) is influenced by the values of fit) for t —> °o, and, accord¬ 
ingly, that extreme caution is needed in practice. The last section contains 
some simple remarks on the practical computation of the solution. 

2. Generalities on equations (1.1) and (1.3). This section contains a few 
remarks on the meaning of our integral equation and on an alternative form 
under which it is encountered m the literature A reader interested only in the 
abstract theory may pass immediately to section 3 

Equation (1.1) can be interpreted in various ways; the most important among 
them are the following two: 

(i) In the theory of industrial replacement (as outlined in particular by Lotka), 
it is assumed that each individual dropping out is immediately replaced by a 
new member of zero age. fit) denotes the density of the probability at the 
moment of installment that an individual will drop out at age t. The function 
g(t ) is defined by 

(2 1) g(t ) = f r,(x)f(t - x) dx, 

where tj(a:) represents the age distribution of the population at the moment 
l = 0 (so that the number of individuals of an age between x and x + 5x is 
t)ix)6x + o(8x)). Obviously git) then represents the rate of dropping out at 
time t of individuals belonging to the parent population Finally, u(t) denotes 
the rate of dropping out at time t of individuals of the total population. Now 
each individual dropping out at time t belongs either to the parent population, 
or it came to the population by the process of replacement at some moment 
t — x (0 < x < t), and hence u(i) satisfies (1.1). It is worthwhile to note that 
in this case 

(2.2) f fit) dt = 1, 

Jo 

since /(<) represents a density of probability. 

(ii) In population theory u(t ) measures the rate of female births at time t > 0. 
The function fit) now represents the reproduction rate of females at age t (that 
is to say, the average number of female descendants bom during (t, t + 8t) 
from a female of age t is f(t)8t + o(St)). If ti(x) again stands for the age distri¬ 
bution of the parent population at t = 0, the function g{f) of (2.1) will obviously 
measure the rate of production of females at time t by members of the parent 
population. Thus we are again led to (1.1), with the difference, however, that 
this time either of the inequalities 

j fit) dl ^ 1 


( 2 . 3 ) 



240 


WILLY FELLER 


ma y occur, the value of this integral shows the tendency of increase or decrease 
in the total population, 

Theoretically speaking, f(t) and g(t) are two arbitrary non-negative functions. 
It is true that g{t) is connected with f(l) by (2.1); but, since the age distribution 
7](x) is arbitrary, g(t) can also be considered as an arbitrarily preseribed function. 

It is hardly necessary to interpret the more general equation (1.3) in detail, 
it is the straightforward generalization of (1.1) to the case where the increase or 
decrease of the population is not necessarily a continuous process. This form 
of the equation is frequently better adapted to practical needs. Indeed, the 
functions /(<) and g(t) are usually determined from observations, so that only 
their mean values over some time units (years) are known. In such cases it is 
sometimes simpler to treat /(<) and g(i) as discontinuous functions, using 
equation (1 3) instead of (1 1). For some advantages of such a procedure 
see section 7. It may also be mentioned that the most frequently (if not the 
only) special case of (1.1) studied is that where g(t) = f(i). Now it is apparent 
from (2.1) that this means that all members of the parent population are of 
zero age: in this case, however, there is no continuous age-distribution tj(x). 
Instead we have to use a discontinuous function tj(:e) and write (2.1) in the form 
of a Stieltjes integral. Thus discontinuous functions and Stieltjes integrals 
present themselves automatically, though in a somewhat disguised form, even 
in the simplest cases. 

At this point a remark may be inserted which will prove useful for a better 
understanding later on (section 6). In the current literature we are frequently 
confronted not with (1.1) but with 

(2.4) u{t) = f u(t — x)f(x) dx, 

Jo 

together with the explanation that it is asked to find a solution of (2.4) which 
reduces, for t < 0, to a prescribed function h(t). Now such a function, as is 
known, exists only under very exceptional conditions, and (2.4) is by no means 
equivalent to (1.1). The current argument can be boiled down to the following. 
Suppose first that the function g{t) of (1.1) is given in the special form 

(2.5) (7(0-= J h(t - x)f(x) dx, 

where h(x) is a non-negative function defined for x < 0. Since the solution 
u{t) of (1.1) has a meaning only for { > 0, we are free to define that u{ — t) = 
h( — t) for t > 0. This arbitrary definition, then, formally reduces (1.1) to (2.4). 
It should be noted, however, that this function u{t) does not, in general, satisfy 
(2.4) for t < 0, for h(t) was prescribed arbitrarily Thus we are not, after all, 
concerned with (2 4) but with (1.1), which form of the equation is, by the way, 
the more general one for our purposes. If there really existed a solution of 
(2 4) which reduced to h(t) for i < 0, we could of course define g(t) by (2,5) and 
transform (2.4) into (1.1) by splitting the interval (0, °°) mto the subintervals 



RENEWAL THEORY 


247 


(0, t) and (£, °o). However, as was already mentioned, a solution of the required 
kind does not exist in general. It will also be seen (section 6) that the true 
nature of the different methods and the limits of their applicability can be under¬ 
stood only when the considerations are based on the proper equation (1.1) and 
not on (2.4). 

3. Existence of solutions. 

Theorem 1 Let F(t) and G(t) be two finite non-decreasing functions which 
are continuous to the right 5 Suppose that 

(3.1) ^(0) = 0(0) = 0, 
and that the Laplace integrals 6 

(3.2) #>(s) = f* e~‘ l dF(t), y(s) = f* e~“ dO(t) 

J 0 J o 

converge at least for s > a- > 0\ In case that lim cp(s) > 1, let <r' > o- be the root 8 
of the characteristic equation <p(s) = 1, in case lim <p(s) < 1, put a' — tr. 

a —*ir-fO 

Under these conditions there exists for t > 0 one and only one finite non-decreasing 
function U(t ) satisfying (1.3). With this function the Laplace integral 

(3.3) «(a) - fe~' l dU(t) 

Jo 


8 It is needless to emphasize that this lestriction is imposed only to avoid trivial am¬ 
biguities. 

6 The integrals (3 2) should be interpreted as Lebesgue-Stieltjes integrals over open 
intervals; thus 

<p{s) = lim f e~'‘dF(t), 

«-+o J e 

which implies that ip(s) —> 0 as s —> <= . Alternatively it can be supposed that F(l) and 
£?(() have no discontinuities at t = 0 Continuity of at t = 0 means that there is no 
reproduction at zero age. This assumption is most natural for our problejn, but is by no 
means necessary. In order to investigate the case where F(t) has a saltus c > 0 at t = 0, 
one should take the integrals (3.2) over the closed set [0, w ], so that 

¥>(») = c + lim [ e~ ,l dF{C). 

It iB readily seen that Theorem 1 and its proof remain valid if 0 < c < 1. However, if 
c > 1, then (1 3) plainly has no solution U(t) The continuity of G{t) at t = 0 is of no 
importance and is not used in the sequel 

7 The condition is formulated in this general way in view of later applications (cf., e g,, 
the lemma of section 4) In all cases of practical interest <r = 0, 

8 v(s) is, of course, monotonic for s > <r and tends to zero as s —* °° . In order to ensure 
the existence of a root of <p(s) = 1, it is sufficient to suppose that the saltus c of F(£) at t = 0 
is less than 1 (cf footnote 6) 



248 


WILLY FELLER 


converges for & > a', and 
(3.4) 


oj(s) = 


7(3) 

1 — tp(s) ’ 


Proof. A trivial computation shows that for any finite non-decreasing solu¬ 
tion (7(0 of (1 3) and any T > 0 we have 

r.-dUO) = [ T c~‘ l dG(t) + [ T e~‘=dF(x) f* * e~“ dU(t), 

Jo J 0 Jo Jo 

herein all terms are non-negative and hence by ( 3 . 2 ) 

T e~ ,l dU(t) < 7 (a) + v {s) f T e~‘ l dU(t). 

Jo Jo 

Now ip(s ) < 1 for s > a\ and hence it is seen that the integral (3.3) exists for 
s > a' and satisfies (3.4). On the other hand it is well-known that the values of 
w(s) for s > a' determine the corresponding function U(t) uniquely, except for 
an additive constant, at all points of continuity. However, from (1.3) and ( 3 . 1 ) 
it follows that (7(0) = 0 and, since by (1.3) U(t) is continuous to the right, the 
monotone solution U(l) of (1.3), if it exists, is determined uniquely. 

To prove the existence of U(t ) consider a function o>(s) defined for s > a' by 

(3.4) , It is clear from (3.2) that <p(s) and y(s) are completely monotone func¬ 
tions, that is to say that ip(s) and 7 (s) have, for 8 > <r, derivatives of all orders 
and that ( — l)'V ,> (s) > 0 and ( — l) n 7 (nl ($) > 0 . We can therefore differentiate 

(3.4) any number of times, and it is seen that w Cn) (a) is continuous for s > <r'. 
Now a simple inductive argument shows that (— 1 )V'° (s) is a product of 
{1 — y>(s)r <n+1> by a finite number of completely monotone functions It 
follows that (—l) n <o <, 0 (s) > 0 , so that «(s) is a completely monotone function, 
at least for s > a'. Hence it follows from a well-known theorem of S. Bernstein 
and 23. V. Widder 9 that there exists a non-decreasing function (7(<) such that 
(3.3) holds for s > a'. Moreover, this function can obviously be so defined that 
(7(0) = 0 and that it is continuous to the right. Using (7(1) let us form a new 
function 


(3.5) V(t) = [‘ U(t~x)dF(x). 

Jo 

V(t) is clearly non-negative and non-decreasing. It is readily verified (and, of 
course, well-known) that 

= f e~“dV(t) = a>( b)vj(s). 

Jo 

It follows, therefore, from (3.4) that \p(a) = w(s) — y(s), and this implies, by the 

6 Thia theorem has been repeatedly proved by several authors; for a recent proof of. 
Feller [19] 



RENEWAL THEORY 


249 


uniqueness theorem for Laplace transforms, that 7(2) = U(t) — (7(2). Combin¬ 
ing this result with (3 5) it is seen that U(t) is a solution of (1.3). 

Theorem 2. Suppose that f{t) and git ) are measurable, non-negative and 
bounded m every finite interval 0 < t < T Let the integrals 

(3.6) <p(s) = f e~“f(t) dt, y(s) = [ e-^git) dt 

Jo Jo 

converge for s > <r. Then there exists one and only one non-negative solution u(t ) 
of (1.1) which is boimded in every finite interval ! 10 . With this function the integral 

(3 7) «(«) = f e-‘ ( u(t ) di 

Jo 

converges at least for s > a', where a' = a if lim <p(s) < 1, and otherwise </ > a 

a— *-ir+0 

is defined as the root of the characteristic equation tp(s) = 1. For s > a' equation 
(3.4) holds. 

If f(t) is continuous except, perhaps , at a finite number of points then u(t) — g(i) 
is continuous. 

Proof: Define Fit) and Git) by (1.2) Under the present conditions these 
functions satisfy the conditions of Theorem 1, and hence (1.3) has a non-decreas¬ 
ing solution XJit). Consider, then, an arbitrary interval 0 < t < T and suppose 
that in this interval f(t) < M and g(t) < M. If 0 < t < t + h < T we have 
by (1.3) 

0 <\{U(.t + h) - 17(0} 

= ^ {<7(2 + h) — G(t) f ^ U(t + h — x)f{x) dx 

+ 1 j i j‘{U(t+h~x)-U(t- x))fix)dx 

< M + MU(T) + ~ [‘ {U(t + h - x) - U(t - x))dx 

h, Jq 

M r t+h M r h 

= M + MU(T) +fj t U(y) dy-f j U(y) dy 

< M + 2 MU(T). 

Thus U(t) has bounded difference ratios and is therefore an integral. The 
derivative XJ'{t) exists for almost all t and 0 < U'(t) < M. Accordingly we can 
differentiate (1.3) formally, and since U( 0) = 0 it follows that u{t) = U'(t) 
satisfies (1.1) for almost all t. However, changing u(t) on a set of measure zero 
does not affect the integral in (1.1), and since g(t) is defined for all t it is seen that 

10 Without the assumptions of positiveness and boundedness this theorem reduces to a 
special case of a theorem by Paley and Wiener (21]; cf. section 1, p, 243 



250 


WILLY FELLER 


u(t ) can be defined, in a unique way, so as to satisfy (1.1) and obtain (1.3). 
Since the solution of (1.3) was uniquely determined it follows that the solution 
u(t ) is also unique. Obviously equations (3.7) and (3.3) define the same function 
w(s), so that (3.4) holds, and (3.7) converges for s > a'. 

Finally, if /(() has only a finite number of jumps, the continuity of u(l) — g(t) 
becomes evident upon writing (1.1) in the form 


u(t) — g(t) = / u{x)f(jt — x ) dx. 
Jo 


4 Asymptotic properties. In this section we Bhall be concerned with the 
asymptotic behavior as t —► w not of u(t) itself but of the mean value u*(t) = 
1 f ‘ 

- I u(t) dr. If u(i) tends to the (not necessarily finite.) limit C, then obviously 
t Jo 

also u*(t) —>■ C , whereas the converse is not necessarily true. For the proof of 
the theorem we shall need the following obvious but useful 
Lemma: If u(t) > 0 is a solution of (1.1) and if 

(4.1) ui (0 = e kt u(l), Mt) = e kt m, gi (t) = e k, g(t), 
then ux(t) is a solution of 

Ui(t) - + f Ui(t — x)fi(x) dx. 

Jo 

Theorem 3: Suppose that using the functions defined in Theorem 2 the integrals 

(4.2) f f(t)dl-a, [ g(t)dt = b, 

Jd JQ 

are finite. 

(i) In order that 


(4.3) 


u *(t) = - f u{t) dr —> C 
t Jo 


as t —► oo, where C is a positive constant , it is necessary and sufficient that a — 1, 
and that the moment, 


(4.4) 

be finite. In this case 

(4.5) 

(ii) If a < 1 we have 

(4.6) 


f t f(t) d: — o 

Jo 


C-l. 

m 


f u(t ) dt = —^— . 
J« 1 — a 



RENEWAL THEORY 


251 


(hi) If a > 1 let a' be the 'positive root of the characteristic equation p(s) = 1 
(cf. (3.2)) and put 11 

(4.7) f e~ r ' l tf(f)dt = m u 

Jo 

Then 

(4.8) limy f e~ aT u(r) dr = —. 

(-*00 i Jo 1U\ 

Remark: The case a = 1 corresponds in demography to a population of 
stationary size. In the theory of industrial replacement only the case a — 1 
occurs; the moment m is the average lifetime of an individual. The case a > 1 
corresponds in demography to a population m which the fertility is greater than 
the mortality. As is seen from (4.8), in this case the mean value of u(t) increases 
exponentially. It is of special interest to note that in a population with a < 1 
the integral (4.6) always converges 
Proof- By (4.2) and (3 7) 

(4.9) lim <h(s) = a, lim y(s) = b. 

a—*+0 a—►-f'O 

If a < 1, it follows from (3.4) that lim oj( s) = 5/(1 — a ) is finite. Since u(t) > 0 
this obviously implies that (4.6) holds. This proves (n). 

If a = 1 and m is finite, it is readily seen that 


lim 

a —*+0 


1 — p(s) 
s 


m, 


and hence by (3.4) 

lim sco(s) = lim 7 ( 3 ) lim -— S . . - = —. 

»-*+o j-*+o »-*+o 1 — <p{s) m 

By a well-known Tauberian theorem for Laplace integrals of non-negative 

functions 12 it follows that u*(t) —» Conversely, if (4.3) holds it is readily 

m 

seen that 13 


11 (4 2) implies the finiteness of m, 

13 Cf. e.g Doetsch [18], p. 208 or 210. 

13 Indeed, if (4,3) holds and if U(i) is defined by (1.2), then there is a M = M(e) s'-ch that 
| 17(0 - Ct \ < M + d Now 


and hence 


¥>(s) = s f e al U(t) dt, 

Jo 

sv’(s) — C = s 2 I e * ( (t7(0 — Cf) dt, 

Jo 


I «p(s) 


- C I < s a f + tt) dt = ttM + «. 

Jo 


or 



252 


WILLY FELLER 


lim sw(s) = C, 

«-»+Q 


which, in turn implies by (3.4) and (4.9) that 


lim 

*-*+o 


1 — v(s) 

s 


b 


C’ 


This obviously means that the moment (4.4) exists and equals b/O. This 
proves (i). 

Finally case (iii) reduces immediately to (ii) using the above lemma with 
k ~ —V. This finishes the proof. 

It may be remarked that the finiteness of the integrals (4.2) is by no means- 
necessary for (4.3). This is shown by the following 

Example; Let 


m = 


1 /« 
2 Vi<W» ’ 



It is readily seen that with these functions a = 1, but b = <». 
e~vT and y(s) = e _ v»/^/g | a o that 


to(s) = 


-\Zs (1 — e - V») 


Now 14 <p(s) =» 


Thus su(s) —»1 as a —> 4- 0, and hence u*(t) —► 1. In this particular case it can 
even be shown that the solution -u(l) itself tends to 1 as t —v ». 

In practice, however, the integrals (4,2) will always exist, and accordingly we 
restrict the consideration to this case. 


5. Closer study of asymptotic properties. In this section we shall deal almost 
exclusively with the most important special case, namely where 

(5.1) f fit) dt = 1. 

■'O 

The question has been much discussed whether in this case necessarily u(t) —* C 
as i —» oo, which statement, if true, would be a refinement of (4.3). Hadwiger 
[2] has constructed a rather complicated example to show that u(t) does not 
necessarily approach a limit, Now this can also be seen directly and without 
any computations. Indeed, if u(t) C and if (5,1) holds, then obviously 

lim f u(t — x)f(x} dx — C, 

<—*oo Jo 

and hence it follows from (1.1) that g(t) —► 0. In order that u(t) —► C it is therefore 


“The integrals can be evaluated by elementary methods, and are known, cf. e.g. 
Doetsch [18], p. 25. 



RENEWAL THEORY 


253 


necessary that g(t) —» 0, and this proves the assertion. In Hadwiger’s example 
lim sup git) = », which makes his computations unnecessary. 

It can be shown in a similar manner that not even the condition g(t) — ► 0 is 
sufficient to ensure that u(t) —> C. Some restriction as to the total variation of 
f{t) seems both necessary and natural (conditions on the existence of derivatives 
are not sufficient). In the following theorem we shall prove the convergence of 
u(t) under a condition which is, though not strictly necessary, sufficiently wide 
to cover all cases of any possible practical interest. 

Theorem 4: Suppose that with the functions f(t) and g(t) of Theorem 2 

f(t) dt = 1 , f g(t)dt = b < 

Jo 

Suppose moreover that there exists an integer n ^ 2 such that the moments 



(5.3) 



* = 1, 2, ■ ■ •, n, 


are finite, and that the functions f(t), tf{t), t 2 f(t), ... ,t n 2 f(t) are of bounded total 
variation over (0, °°). Suppose finally that 

(5.4) lim t n ~ 2 g(t) = 0 and lim t n ~ 2 f g(x)dx = 0. 

t —♦« " t 

Then 

(5.5) lim u(t) = — 

mi 

and 

(5.6) hm t n ~ 2 \uit) - —\ = 0. 

t-* \ mi) 

Remark- As it was shown in section 4, the case where / fit) dt > 1 

jo 

can readily be reduced to the above theorem by applying the lemma of section 4 
with k = a', where a-' is the positive root of <p(s) = I: it is only necessary to 
suppose that e _<, l /(i) is of bounded total variation and that e~° l g{t) —> 0. Ob¬ 
viously all moments of e _, ''/(<) exist, so that the above theorem shows that 
uft) = e~ c ' l u(t) tends to the finite limit h'(mi, where 


b' = f e~ c,t g{t) dt, mi= f outfit) dt. 

Jo Jo 

Thus in this case and under the above assumptions u(t) ~ > e" *, so that the 

mi 

renewal function increases exponentially as could be expected. If however 



< 1 , 



254 


"WILLY FELLER 


u(t) will in general not show an* exponential character. If fit) is of bounded 
variation and has a finite moment of second order, and if g(t) —> 0 , then it can be 
shown that u(t) —* 0 However, the lemma of section 4 can be applied only if 
the integral defining <p(s) converges in some negative s-interval containing a value¬ 
s' such that <(>(s') = 1, and this is in general not the case. 

Proof: The proof of Theorem 4 will be based on a Tauberian theorem due to- 
Haar ls . With some specializations and obvious changes this theorem can be 
formulated as follows. 

Suppose that l(t ) is, for t > 0, non-negative and continuous, and that the 
Laplace integral 

(5.7) X(s) = [ dl 

Jo 

converges for s > 0. Consider X(s) as a function of the complex variables — 
x + vy and suppose that the following conditions are fulfilled: 

(i) For i/^0 the function X(s) (which is always regular for x > 0) has con¬ 
tinuous boundary values X(fy) as x —► +0, for x > 0 and t/ ^ 0 


(5.8) X(s) - j + *(•), 

where \p{iy) has finite derivatives \t'(£y), - > '/' <r) (ty) and \t (r) {iy) is bounded 

in every finite interval ; 

(ii) f e ilv \(x + iy) dy 


converges for some fixed x > 0 uniformly with respect to t > T > 0; 

(iii) X(a; -f- iy) —► 0 as y —> ± «, uniformly with respect to x > 0; 

(iv) X'(ty), X"(fy), ■ • - , X (rl (iy) tend to zero as y —► ± » ; 

(v) The integrals 


f e uv X w (iy) dy and f e Uv \ (r) (iy) dy 
J—oo J 

(where yi < 0 and y 2 > 0 are fixed) converge uniformly with respect to t > T > 0. 
Under these conditions 


(5.9) limf{i(«) - C\ « 0. 

Now the hypotheses of this theorem are too restrictive to be applied to the 
solution u(i) of (1.1), We shall therefore: replace (1.1) by the more special 
equation 

(5.10) v{t ) = hi 0 + f vit — x)fix) dx, 

Jo 


15 Haar [20] or Doetsch [18], p. 269. 



RENEWAL THEORY 


255 


where 


(5.11) h(t ) = f f{t - x)fix) dx, 

Jo 

Plainly Theorem 2 can be applied to (5.10). It is also plain that h(t) is bounded 
and non-negative and that (by (51)) 


(5.12) 

(5.13) 



1 , 



<T“fc(f) dt = <p\s). 


Accordingly we have by Theorem 2 

( 5 . 14 ) r(s) = f dt = . 

Jo 1 — <pis) 

We shall first verify that f(s) satisfies the conditions of Haar’s theorem with 
r = n — 2 . For this purpose we write 

( 5 . 15 ) fit ) = Mt) - Mi), 

where /i(f) and / 2 (f) are non-decreasing and non-negative functions which are, 
by assumption, bounded. 

( 5 . 16 ) 0 < Mt) < M, 0 < Mt) < M. 


(a) We show that v{t) is continuous. Now by Theorem 2 the solution v{t) 
of (5.10) is certainly continuous if h(t ) is continuous; however, that h(t) is con¬ 
tinuous follows directly from (5.11) and the fact that the functions 

[ Mt — x)fix) dx and f Mt — x)f(x) dx 
Jo J o 


are continuous. 

(b) In view of (5.1) the function <p(s ) exists for x = 9t(s) > 0. Obviously 
| <pix + iy) | < 1 forx > 0. Now 

1 - <p(iy) = f( 1 - 
Jo 

= [ (1 — cos yt)fit) dt + i [ sin yt-f(t) dt, 

Jo Jo 


and, since 1 — cos yt > 0 and/(f) > 0 , the equality <p(iy) = 1 for y 0 would 
imply that /(f) = 0 except on a set of measure zero. It is therefore seen that 
tfiix + iy) ^ 1 for all x > 0 and for x = 0 , y ^ 0. 

It follows furthermore from (5.3) that for k — 1, • • • , n and x > 0 the deriva¬ 
tives 

<P W is) = f i-tYe- 1 fit) dt 

Jo 



256 


WILLY FELLER 


exist and that 


lim <p w (x + iy) - <p w (iy). 

x— *4-0 


Finally, it is readily seen that in the neighborhood of y = 0 we have 

<p(iy) = f t*'m dt 
Jo 


(5.17) = 1 - rrniy + ^(iyY - + •■• 


+ + 0(| y I"). 


(c) From what was said under (b) it follows by (5.14) that f(s) is regular for 
x > 0 , and that {"(s), £'(s), ■ ■ ■ , f (n) (s) approach continuous boundary values 
as s = x + iy approaches a point of the imaginary axis other than the origin. 
Now put 


(5.18) 


f(s) = 


y(«) 

1 — <p(s) 


1 

mi s' 


so that by (5.14) 
(5.19) 


r<«) 


mis 


+ i'(s). 


For x > 0 and a: = 0, y 0 the function \p(x -f iy) is obviously continuous; 
the derivatives ^'(ly), • ■ • , ^ n \%y) exist. To investigate the behavior of 
^(iy) in the neighborhood of y = 0 put 


(5.20) P(y) = m x - ^ (iy) +-(-1 )"' 1 ( iy ) n ~ I . 


By (5.17), (5.18) and (5.20) 

(5.21) ftiy) = [[i_Z^gMT 


2 

WlJ 


~ + o(h-ry 


Now the expression in brackets represents an analytic function of y which 
vanishes at y = 0. Hence yf/(iy) - %}(y) -f- 0(| y | n ~ 2 ), where $(y) denotes a 
power series. It follows that the derivatives \p‘(iy ), * ■ • , ^ n ~ 2 \iy) exist for all 
real y (including y — 0 ) and are bounded for sufficiently small j y | : since they 
are continuous functions they are bounded in every finite interval. 

(d). Next we show that there exists a constant A > 0 such that for sufficiently 
large | y | ’ 

(5.22) | v(x + iy) | < A 

12/1 

uniformly in x > 0. By (5,15) 


(5.23) 



{cos yt — i$inyt\e~^ l {fi{l) — f 2 (t)}d(. 



BENEWAL THEOKY 


257 


Now fi(t) is non-decreasing and accordingly by the second mean-value theorem 
we have for any T > 0 and y 

[ cos yt-fi(t) dt = fi(T) f cos ytdt - fi(T) — ^ — - nT ^ , 

— t y 

where r is some value between 0 and T (depending, of course, on y ; at points of 
discontinuity,/i(T) should be replaced by lim Hence by (5.16) 


( cos yt • e xt dt 
J o 


< 


2 M 


Treating the other terms in (5.23) m a like manner, (5.22) follows. 
Combining (5.22) with (5.14) it is seen that for sufficiently large | y [ 


r(«)l < 


2A* 

V 1 


uniformly in x > 0. This shows that the assumptions (ii) and (iii) of Haar’s 
theorem are satisfied for \(s) = f(s). In order to prove that also conditions 
(iv) and (v) are satisfied it suffices to notice that the proof of (5.22) used only 
the fact that f(t) is of bounded total variation. Now <p {k) (s) is the Laplace trans¬ 
form of (—t) k f(t), and, since t k f(t) is of bounded total variation for k < n — 2, 
it follows that 

l* w («) I = 0(1 2 / r 1 ), k = 1, 2, ... ,n - 2 , 


for sufficiently large | y |, uniformly in x > 0. Differentiating (5.14) k times it is 
also seen that 


lr W (s)| = 0(\ y n, k = 1 , 2 , ... ,n - 2 , 


as y —* + «, uniformly with respect to x > 0. 

This enumeration shows that v(s) = l(i) and X(s) = f(s) satisfy'all hypotheses 
of Haar’s theorem -with r = n — 2 and C = 1/mi. Hence 

(5.24) lim i t_z /ii(f)-—\ = 0. 

t —^ 17%\ J 

Returning now to (5.14) we get 


w(s) = y(s) + y(s)<p(s) + 7 (s)f(s), 
or, by the uniqueness property of Laplace integrals, 

u(t) = g(t) + f g(x)f(t - x)dx + f g(x)v(i - x)dx 
Jo Jo 

= g(t) + ui(t) + Ui(t) 


(5.25) 


(which relation can also be checked directly using (5.10)). Let us begin with 
the last term. We have by (5.2) 



258 


WILLY FELLER 


ui{t) - — = f g(t - *){«(*) - —I'd®, 
mi Jo l. mij 

and hence 

r* vS) - — < 2^ [‘ g(t~ x)x n ^\v(x) - i- dz 
mi J i/i I mi 

+ i"“ !1 f ff(tf) »(t ~y) ——• dy. 

Jin mi 

If t is sufficiently large we have by (5.24) in the first integral x n ~ 2 v(x) —— < «. 

Wli 

In the second integral v(t — y) ——is bounded, and hence by (5.4) 

J t)X\ 

lim t" _s Unit) -— =0. 

i-.» mi 

The same argument applies (even with some simplifications) also to the second 
term in (5.24); it follows that 

lim = 0, 

whilst t”' 2 g{t) —> 0 by assumption (5.4). Now the assertion (5.6) of our theorem 
follows in view of (5 25) if the last three relationships are added. This finishes 
the proof of Theorem 4. 

It seems that tho solution u(t) is generally supposed to oscillate around its 
limit b/m 1 as t —> 00 . It goes without saying that such a behavior is a priori 
more likely than a monotone character. It should, however, be noticed that 
there is no reason whatsoever to suppose that u(t) always oscillates around its 
limit. Again no computation is necessary to see this, as shown by the following 
Example: Differentiating (1.1) formally we got 

u'(f) = g'(t ) 4- g(Q)f(t) + f u'(l — x)f(x) dx, 

Jo 

which shows that, if g(t) and f(t) are sufficiently regular, u'(t) satisfies an integral 
equation of the same type as u{t). Thus if 

g'(t) + ff(0)/(0 > 0 

for all t, we shall have u'(t) > 0, and u(l) is a monotone function. In particular, 
if ff'(0 + fif(0)/(f) = 0, then u'(i) = 0 and u(i) — const. For example, lot/(0 — 
g(t) = c~\ Then <p(s) = y(s) = l/(s -f 1) and hence w(s) = 1/s, which is the 
Laplace transform of u(t) = 1. It is also seen directly that u(t) as 1 is the 
solution. We have however the following 
Theorem 5 16, If the functions f(t) and g{t) of Theorem 4 vanish identically for 
t > T > 0, then the solution u(t) of (1.1) oscillates around its limit bjmast —* «>. 

14 Under some alight additional hypotheaea and with quite different methods thiB theorem 
was proved by Richter [16]. 



RENEWAL THEORY 


259 


Proof: For t > T equation (1.1) reduces to 

u{t) = f u(t — x)f{x) dx, 

J i_r 

and since / f(x) dx = 1 it follows that the maxima of u(t) in the intervals nT < 

J t—T 

t < (n + 1)T form, for sufficiently large integers n, a non-increasing sequence. 
Similarly the corresponding minima do not decrease Since uit) —> b/m i, by 
Theorem 4, it follows that the minima do not exceed b/rrh and the maxima are 
not smaller than b/m i. 

6. On Lotka’s method. Probably the most widely used method for treating 
equation (1.1) in connection with problems of the renewal theory is Lotka's 
method. As a matter of fact this method consists of two independent parts. 
The first step aims at obtaining the exact solution of (1.1) in the form of a series 
of exponential terms (this is achieved by an adaptation of a method which was 
used by P. Herz and Herglotz for other purposes The second part of Lotka’s 
theory consists of devices for a convenient approximative computation of the 
first few terms of the series While restricting ourselves formally to Lotka’s 
theory, it will be seen that some of the following remarks apply equally to other 
methods. 

Lotka’s method rests essentially on the fundamental assumption that the 
characteristic equation 

( 6 . 1 ) v(s) = 1 

has infinitely many distinct simple 17 roots So, Si, • • • , and that the solution u(i) 
of (1.1) can be expanded into a series 

(6.2) u(t) = £ A k e n ‘ 

k 

where the A k are complex constants. The argument usually rests on an assumed 
completeness-property of the roots Thus, starting from (2.4) it is required that 

(6.2) reduces to h(t) for t < 0; in other words, that an arbitrarily prescribed 
function h(x) be, for x < 0, respresentable in the form 

(6.3) h(x) = £ A k e h * (x < 0). 

* 

In practice we are, of course, usually not concerned with h(t) but with gii) (cf. 
(2.5)), and according to Lotka’s theory the coefficients A k of the solution (6.2) 
of (1.1) can be computed directly from g(t) in a way similar to the computation 
of the Fourier coefficients. 

Lotka’s method is known to lead to correct results in many cases and also to 


17 Hadwiger [3] objected to the assumption that all roots of (6.1) be simple The modifi¬ 
cations which are necessary to cover the case of multiple roots also will be indicated below 



260 


WILLY feller 


have distinct computational merits. On the other hand it seems to require a 
safer justification, since its fundamental assumptions are rarely realized. Thus 
clearly an arbitrary function h(x) cannot be represented in the form (6.3): to 
see this it suffices to note that (6.1) frequently has only a finite number of roots 
(cf. also below) It should also be noted that, the series (6.3) having regularity 
properties as aie assumed in Lotka's theory, any function representable m the 
form (6.3) is necessarily a solution of the integral equation (2.4), whereas the 
theory requires us to construct a solution u{L) which reduces to an arbitrarily 
prescribed function h(t) for t < 0, (which frequently is an empirical function, 
determined by observations). Nevertheless, it is possible to give sound founda¬ 
tions to Lotka’s method so that it can be used (with some essential limitations 
and modifications) sometimes even in cases for which it originally was not 
intended. For this purpose it turns out to be necessary that all considerations 
be based on the more general equation (1.1), instead of (2.4) (cf. also section 2). 

Before proceeding it is necessary to make clear what is really meant by a root of 
(6.1). The function <p(s) is defined by (3.2), and the integral will in general 
converge only for s-values situated in the half-plane 91 (s) > a. Usually only 
roots situated in this half-plane arc considered 18 . It is also argued that <p(s) 
is, for real s, a monotone function, so that (6.1) has at most one real root; ac¬ 
cordingly the terms of (6.2) are called "oscillatory components.” However, 
the function y(s) can usually be defined by analytic continuation even outside 
the half-plane 91(s) > a, and, if this is done, (6.1) will in general also have roots 
in the half-plane 9?(s) < <r. It will be seen in the sequel that these roots play 
exactly the same role for the solution u(t) as the other ones, and that the ap¬ 
plicability of Lotka’s method depends on the behavior of p(s) in the entire 
complex s-plane. It may be of interest to quote an example where (6.1) has 
infinitely many real and no other roots. 

Example 10 : Let 

(6.4) M - t> 0; 


18 Thia waB stated in particular by Hadwiger [3] and Hadmger and Ruchti [6]; accord¬ 
ingly the results of the latter paper (obtained by methods quite different from Lotka’s) 
need some modifications. 

19 Cf. the example at tho end of section 4, A funotion closely related to (0,4) 
plays an important role in two recent papers by Hadwiger [4] and (6], Hadwigcr’e conclu¬ 
sion, if it could be justified, would fundamentally change the aspect of the whole theory. 
The conclusion reached by Hadwiger seems to be that for any biological population the 
reproduction function should be of the form u(i) = EVn(0> where u,(t) represents the 
contribution of the nth generation and 

C) uM - -^-1 e -A*+Co n -n» 0 */i 

* V’ r t 

Here a, A and C are constants Clearly (*) is a generalization of (6 4) Now his conclusion 
is based on the arbitrary assumption that u«{t) should be of the form u„(l) = \j/(x, na ) 



RENEWAL THEORY 


261 


It is easily seen that <p(s) = e NT, The integral (3.2) converges only for 9i(s) > 
0, but ifi(s) is defined as a two-valued function m the entire s-plane. The roots of 

(6.1) are obviously a* = —4 fcV 2 , so that all of them are real and simple. If 
g(t) = /(<). we get by (3.4) 


6)(S) = 


e V> 

1 — e~V« 



s real, > 0. 


Now e '■v'ms the Laplace transform of - - e 7,1/41 , and hence it is readily 

Z\/ 7T 

seen that the solution u(t) can be written in the form 


(6.5) 


u(t) 


2\/v iz’ 1 


E ne~ nt/u ■ 


of course, this expansion is not of form (6.2) and shows no oscillatory character. 

From now on we shall consistently denote by <p(s) the function defined by the 
integral (3 4) and by the usual piocess of analytic continuation; accordingly we 
shall take into consideration all roots of (6 1). The main limitation of Lotka’s 
theory can then be formulated in the following way: Lotka’s method depends 
only on the function g(t ) and on the roots of (6.1) Now two different functions 
/(f) can lead to characteristic equations having the same roots Lotka’s method 
would be applicable to both only if the corresponding two integral equations 

(1.1) had the same solution u(t). This, however, is not necessarily the case. 
Thus, if Lotka’s method is applied, and if all computations are correctly per¬ 
formed, and if the resulting series for u(t) converges uniformly, there is no 
possibility of telling which equation is really satisfied by the resulting u(t)- 
it can happen that one has unwittingly solved some unknown equation of type 

(1.1) which, by chance, leads to a characteristic equation having the same roots 
as the characteristic equation of the integral equation with which one was really 
concerned. Indeed this happens in the following example which is familiar in 
connection with our problem. It is illustrative also for other purposes: thus it 
shows not only limitations of Lotka's method, but also that this method can be 
modified so as to become applicable in some cases where the characteristic equa¬ 
tion has only a finite number of roots. 


where \l/(x, a) is independent of n. To my mind Hadwiger’s result shows only the im- 
practibility of this axiom. HoweveT, Hadwiger’s result is not correct evenunderhis assump¬ 
tion. Indeed, he derives for \p(x, a) the functional equation 

(**) iKs, a 4- 6) = f — £, a)iK£, &) <i£, 

Jo 

which is well-known from the theory of stochastic processes. Now Hadwiger merely 
verifies the known result that (*) leads to a solution of (**) However, (**) has infinitely 
many other solutions (it is possible to write down expressions for their Laplace transforms, 
although it is difficult to express the solutions themselves explicitly) This, of course, 
renders Hadwiger’s result illusory. 




262 


WILLY FELLER 


Example: Pearson type Ill-curve s . 20 Consider the integral equation (1.1) 
m the following two cases: 

a ) m = ^(0 = mo = ^ i m e~ ( 

and 

(ii) f(t) = ait) = /„(t) = 

It is readily seen (and well known) that the corresponding Laplace transforms are 


(I) 

Pi(s) = 

1 

(s + 1) 3/2 

and 



(II) 

¥Ul( s ) = 

1 

(s 4- l) 5 ' 


respectively. Thus in both cases the characteristic equation has the same roots, 
namely 

si = 0, si,} = — | ^ \/3, 


of which only the first one lies in the half-plane of convergence of the integral 
(3.4). Lotka’s method is not applicable since there aro only three roots. How¬ 
ever, in the second case, an expansion of type (6.2) is possible. Indeed, we have 
by (3.4) 


«n( fi ) — 


1 — mis) 


« 3 + 3s 2 + 3s 

1 i 

1 6 “ 2V3 

3s 


s+ 1 ~ w* 


l , _L_ 

6 _ 2V3 
s + | + ^V3 


now l/(s + a) is the Laplace transform of e a< , and hence we obtain the solution 
u(t) in the form 



!a General Pearson enrves have been investigated recently in connection with (1,1) by 
Brown [1], Hadwiger and Ruchti [6] and Rhodes [15], Hadwiger and Ruchti use a method 
of their own, but they are also led to the study of the characteristic equation (6.1) in a 
slightly disguised form - their result needs a modification since they arbitrarily drop the 
roots lying in the halfplane of divergence of the integral <p(s). 



RENEWAL THEORY 


263 


which is an expansion of type (6.2). In the first of the above examples we get 
for real positive s 


mj (s) = = V_ l _ 

IW 1 - *i(a) £i (s + 1 .)«»/*’ 

and it is readily seen that this is the Laplace transform of the solution 


Ui(f) = 


oo 1 

y _I_ 

r(3n/2) 


^3(a-2)/2 


The series is convergent for t > 0, but obviously this solution cannot be repre¬ 
sented in a form similar to (6.2). 

A similar remark applies to the general Pearson-type III curve 


At) = Afe—, 


where A, a, /3 are positive constants; the corresponding Laplace transform is 


v(s) = AT(0 + 1) 


1 

(s + a)^ +l 


These preparatory remarks enable us to formulate rigorous conditions for the 
existence of an expansion of type (6.2), The following theorem shows the limits 
of Lotka’s method, but at the same time it also represents an extension of it. 
In the formulation of the theorem we have considered only the case of absolute 
convergence of (6.2). This was done to avoid complications lacking any practi¬ 
cal significance whatsoever. The conditions can, of course, be relaxed along 
customary lines. 

Theorem 6: In order that the solution u(t) of Theorem 2 he representable in 
form (6.2), where the senes converges absolutely for t > 0 and where the Sk denote the 
roots of the characteristic equation (6 1), it is necessary and sufficient that the La¬ 
place transform co(s) admit an expansion 


( 6 . 6 ) 


- Ay - E 


Ak 


s — s* 

and that S| Ak | converges absolutely. The coefficients A k are determined by 


(6.7) Ak=-^. 

In particular, it is necessary that cu(s) be a one-valued function™ 

Proof: All roots s k of (6 1) satisfy the inequality 9J(s*) < c 1 , where <r' was 
defined in Theorem 2. It is therefore readily seen that in case 2 | Ak | con¬ 
verges, the Laplace transfoim of (6.2) can be computed for sufficiently large 


21 The number of roots may be finite or infinite. It should also be noted that it is not 
required that s* —> « If the Sk have a point of accumulation, «(s) will have an essential 
singularity. That this actually can happen can be shown by examples. 

22 This was not so in our example I 



264 


WILLY FELLEll 


positive .9-values by termwise integration so that (6.6) certainly holds for suffi¬ 
ciently large positive s, Now with 21 Al | converging, (6.G) defines w(s) 
uniquely for all complex s (with singularities at the points s* and the points of 
accumulation of s*, if any). Since the analytic continuation is unique, it follows 
that (6.6) holds for all s. The series S | A/ | must, of course, converge if (6.2) 
is to converge absolutely for t = 0, and this proves the necessity of our condi¬ 
tion. Conversely, if w(s) = i s given by (0.6), and if 2 I A* I con- 

1 — p(s) 

verges, then «(s) is the Laplace transform of a function u(t) defined by (6.2). 
Since the Laplace transform is unique, u(i) is the solution of (1.1) by Theorem 2. 
The series (6.2) converges absolutely for f > 0 since | Akc‘ kl \ < \ At. le 1 '’'. 
Finally (6.7) follows directly from (G.0). 

It is interesting to compare (6.7) with formulas (50) and (50) of Lotka’s 
paper [8]. Lotka considers the special ease g(t) = f(l ); in this case y ( s fc) = 

ip(s/e) = 1, and (6.7) reduces to A k = —-. If sl lies in the domain of con¬ 
s' 5 W 

vergence of the integral <p(s) — / dt, that is, if 9i(s*) > a then 

Jo 


( 6 . 8 ) ~ 

in accordance with Lotka's result, However, (6.8) becomes meaningless for the 
roots with $ft(sn) < a-, whereas (6.7) is applicable in all eases. 

Theorem 0 can easily be generalized to the case where the characteristic equa¬ 
tion has multiple roots. The expansion (6.6) (which reduces to the customary 
expansion into partial fractions whenever w(s) is mcromorphio) is to be re¬ 
placed by 


*00 - E 

k 


h igt i 

+ (s - ’ 


where m* is the multiplicity of the root s* . This leads us formally to an 
expansion 


( 6 . 10 ) 


u(t) = E 


+ Ak 


+ Ai mk) 


(vik — 1)1 


which now replaces (0.2). Generalizing Theorem 6 it is easy to formulate some 
simple conditions under which (6.11) will really represent ft solution of (1.1), 
Other conditions which ensure that (6.9) is the transform of (6.10) are known 
from the general theory of Laplace transforms; such conditions usually use only 
function-theoretical properties of (6,9) and are applicable in particular when 
o>(s) is meromorphic. We mention in particular a theorem of Churchill [17] 
which can be used for our purposes. 


7 . On the practical computation, of the solution. There are at hand two main 
methods for the practical computation of the solution of (1 1), One of them 



RENEWAL THEORY 


265 


has been developed by Lotka and consists of an approximate computation of a 
few coefficients in the series (6.2). The other method uses an expansion 

(7-1) u(l) = £ uM), 

n™»0 

where u n (t) represents the contribution of the nth “generation” and is defined 
by x 

(7.2) «o(t) = gif), u n+ i(t) = [ u n (t — x)f(x) dx. 

Jo 

Now the Laplace transform of u n+ i(t) is y(s)<p n (s), and hence (7.2) corresponds 
to the expansion 

(7.3) «(a) = - 7(s) 7 1 = yia) £ <As). 

1 — tp{S) n-0 

In practice the functions g(t) and f(t) are usually not known exactly. Fre¬ 
quently their values are obtained from some statistical material, so that only 
their integrals over some time units, e.g. years, are actually known or, in other 
words, only the values 

•i /"(n-fl)i -i 

(7.4) /» = w/ f(t)dt, g n = -J g(t) dt, 

u J oJnS 

are given, where 8 > 0 is a given constant. Ordinarily in such cases some 
theoretical forms (e.g Pearson curves) are fitted to the empirical data and 
equation (1.1) is solved with these theoretical functions. Now such a procedure 
is sometimes not only very troublesome, but also somewhat arbitrary. Con¬ 
sider for example the limit of u{t) as t —» °o; this asymptotic value is the main 
point of interest of the theory and all practical computations. However, as has 
been shown above, this limit depends only on the moments of the first two 
orders of fit) and g(t), and, unless the fitting is done by the method of moments, 
the resulting value will depend on the special procedure of fitting. Accordingly 
it will sometimes happen that it is of advantage to use the empirical material 
as it is, and this can, at least in principle, always be done. 

If only the values (7.4) are used it is natural to consider /(t) and g(t) as step- 
functions defined by 

fit) = U,) 

(7.5) [ for nB <, t < (n + 1)5. 
g(t) = Q*, ) 

In practice only a finite number among the /« and will be different from zero: 
accordingly the Laplace transforms y(s) and <p(s) reduce to trigonometrical poly¬ 
nomials, so that the analytic study of <o(s) = . becomes particularly 

1 — <p{S) 

simple. Lotka’s method can be applied directly in this case. 



266 


WILLY FELLER 


For a convenient computation of (7,1) it ia better to return to the more general 
equation (1.3), instead of (1.1). The summatory functions F(t) and G(t) should 
not be defined by (1.2) in this case, but simply by 

til 

(7.6) F(t) = £/«, (7(0- Eff- 

n^O t»*w0 

It is readily seen that the solution U(l) of (1.3) can be written in the form 

00 

U(l) = E E7„(f), where 

n-0 

t/o(0 - G(t), £/ B (l - x)dF(x)} 

J 0 

m our case f/„(/) will again be a step-function with jumps at the points kS, the 
corresponding saltus being 

l4 fc) = Oh, Un+l = E 

r-0 

Thus we arrive at exactly the same result as would have been obtained if the 
integrals (7.2) had been computed, starting from (7.4), by the ordinary methods 
for numerical integration of tabulated functions. It is of interest, to note that 
this method of approximate evaluation of the integrals (7.2) leads to the exact 
values of the renewal function of a population where all changes occur in a dis¬ 
continuous way at the end of time intervals of length 6 in such a way that each 
change equals the mean value of the changes of the given population over the 
corresponding time interval. 


REFERENCES 

7. Papers on Ihe integral equation of renewal theory 

Note', Lotka’s paper [81 contains n list of 74 papers on the subject published before 1939 
The following list is to bring Lotka'a list up to June 1941; however no claims to completeness 
are made 

[1[ A. W Brown, A note on the use of a Pearson type III function in renewal theory," 
Annals of Math, Slat . Vol. 11 (1040), pp. 418-453. 

[2] H. IIadwioer, "ZurFragc dcs Beharrungsuustaiules boi kontiauierlich sich ornouorn- 
den Goaamtheiten,” Archiv f, mathem. Wirtschafts- und Sozialfopachung, Vol, 
5 (1939), pp 32-34, 

[31 H. Hadwiokh, "tlber die Integmlgloichung dor Bovdlkerungstheoric,’’ Milteilungen 
Vercin schweizer VeraicherungamaUiemaliker (Bull. Assoc Actuaires suisses), 
Vol. 38 (1939), pp. 1-14. 

[4] H. Hauwiqbr, "Einc analytische Rcproduktionsfunktion filr biologische GeBam- 

theiten," Skand. Aktuarietidskrift (1940), pp 101-113. 

[5] H, Hadwiger, "Nattirlicho Ausacheidefunktionen {dr Geaamtheiten und die Ldsung 

der Erneucrungagleiclumg,’’ Mitteilungen Verein. Schweiz. Versich.-Math., 
Vol. 40 (1940), pp 31-39. 



RENEWAL THEORY 


267 


[6] H, Hadwiger and W Runm, "tlber eine speziellc Klasse analytischer Geburten- 

funktionen,” Mclron, Vol, 13 (1939), No. 4, pp 17-26. 

[7] A Linder, "Die Vermehiungsrale der stabilen Bevolkerung,” Archiv f tnathem, 

Wvtschafts- und Sozialfoi&chung, Vo! 4 (1938), pp 136-156 

[8] A Lotkv, “A contribution to the theory of self-renewing aggicgates, with special 

reference to industrial icplaecment,” Annals of Math. Stat , Vol 10 (1939), 
pp 1-25 

[9] A Lotka, "On an integral equation in population analysis,” Annals of Math Slat., 

Vol. 10 (1939), pp 144-161 

[10] A Lotka, "Thdone analytique cles associations biologiques II,” AcluaUtes Scienh- 

fiques No 180, Paris, 1939. 

[11] A Lotka, “The theory of industrial replacement,” Skand Aktuanetidsknjt (1940), 

pp 1-14 

[12] A Lotka, "Sur une dquation mtdgrale de l’analyse ddinographique et mdustnelle,” 

Mitt.Verein Schweiz Versich -Math., Vol 40 (1940), pp 1-16 

[13] H, Munzner, "Die Eincueiung von Gesamtheilen,” Archiv f math. Wvtschafts - u. 

Sosialforschung, Vo] 4 (1938) 

[14] G A D. PnEiNREiCH, “The theory of mdustnal leplacement ,” Skand Aklmnehdsknft 

(1939), pp. 1-19 

[15] E. C. Rhodes, "Population mathematics, I, II, III,” Roy Stat. Soc Jour , Vol 103 

(1940), pp 61-89,218-245,362-387 

[16] H Richter, "Die Konvergenz dcr Erneuerungsfunktion,” Blatter f Versicherungs- 

mathemahk, Vol, 5 (1940), pp 21-35, 

[16a] H Hadwiger, “tlber eine Funktionalgleichung der Bevolkerungstheorie und eine 
spezielle Klasse analytischer Losungen,” Bl. f, Versicherungsmathmalik, Vol. 5 
(1941), pp 181-188 

[16b] G. A. D, Prienreich, "The present status of renewal theory," Waverly Piess, Balti¬ 
more (1940) 


II Other papers quoted 

[17] R. V, Churchill, “The invoision of the Laplace transformation by a direct expansion 

in series and its application to boundary-value problems,” Math Zeils , Vol, 42 
(1937), pp 567-579. 

[18] G. Doetsch, Theone und Anwendung der Laplace Transformation J. Springer, Berlin, 

1937 

[19] W Feller, "Completely monotone functions and sequences,” Duke Math Jour, 

Vol 5 (1939), pp. 661-674 

[20] A Haar, "tJber asymptotische Entwicklungen von Funktionen,” Math Ann, Vol 96 

(1927), pp 69-107 

[21] R, E, A. C. Pai.ey and N Wiener, "Notes on the theory and application of Fourier 

transforms, VII On the Volterra equation,” Amer, Math, Soc, Tians,, Vol, 35 
(1933), pp 785-791 



ON THE JOINT DISTRIBUTION OF THE MEDIANS IN SAMPLES 
FROM A MULTIVARIATE POPULATION 

By A. M, Mood 
University of Texas 

It is well known [1] that in the case of a population having a single variate 
distributed according to a density function satisfying certain general conditions, 
the median of a sample is asymptotically normally distributed about the popula¬ 
tion median as a mean. It is the purpose of this paper to extend this result to 
populations involving more than one variate. Besides the theoretical interest 
of such a result, there may be some practical value in it when one is dealing with 
samples from a population for which the median is a more efficient statistic than 
the mean, as, for example, when the population variance is not finite, 

The complexity of the exact distribution of the sample median increases 
rapidly with the number of variates which describe the population; it is almost 
impossible to write out completely the distribution for the general case of k 
variates For this reason the author has chosen to give first a detailed presenta¬ 
tion for the case of two variates, then use a condensed notation to establish the 
general result. This is a circuitous route, but it seems to be the only feasible one 
A condensed notation is necessary for the general case, but presented alone it 
would be well-nigh incomprehensible. 


1. Distribution of the median in two dimensions. An extension of A. T. 
Craig's [2] geometrical argument will be used to obtain the exact distribution of 
the sample median. Let us consider two variates Xy and Xt with density function 
fixy , xf) which shall satisfy the following conditions: 

1. f{x i, xf) > 0 

&(&' x ‘ ) ix ' = L M ** + 0 Gr) 

3. I / fix i, xy) dx i dxi = 1 

w—00 o—00 


4. Each of the equations 

f f fixy, Xi) dXidxy = \ 

00 J— OQ 

f f fixy, la) dxidia = \ 

op J—ec 


has a unique real root. 
268 



DISTRIBUTION OF MEDIANS 


269 


If £1 and & are the respective roots of the two equations of this last condition 
then the point (|i, £ 2 ) is defined to be the population median. It will be assumed 
in what follows that the coordinate system has been so chosen that £1 = 0 = £ 2 . 

Let a sample of 2n + 1 elements (xi„ , x 2a )(a: = 1, 2, • ■ , 2n + 1) be drawn 

from this population. The sample median (ah , ah) will be defined as an element 
(not necessarily in the sample) whose x\ coordinate is the middle, with respect 
to magnitude, number of the set of numbers Xi„ , and whose x 2 coordinate is the 
middle number of the set of numbers x ia Now let us compute the probability 
that the sample median will lie in the rectangle 

x, — \ doti < x t < x s + % dx, i = l, 2. 

This rectangle will be denoted by R". The remainder of the plane will be divided 
into eight other regions Ri, ■ - ■ , as indicated by the dotted lines in Figure 1. 
The probability that an element will fall in the region will be denoted by 

P, <J) = / / , f( x u Za) dxxdxi 
J JiiJ'i 




._ 

.51 


X* 

*;j. *: 

x. 


i 

i 

i 

i 

/?(,! 



* ! 
i 

] 

1 

1 


Fig, 1 


Neglecting terms involving differentials of higher order we have 


( 1 ) 


pi = / /. f(xi, x 2 ) dxidxi 

*>X1 VX2 
1**1 r°° 

Pa = / /. f(xi , x 2 ) dx t dxt 

J— oO “So. 


p' = f(x i, x t ) dx-i dxt 

^*1 


V" = /(*l, £s) dxi dxi. 






270 


A. M. MOOB 


We shall consider now that the sample is drawn from a multinomial population 
with probabilities Pi, ■ ■ , p" and pick out those terms which give rise to a 
sample median in R". If the median is an element of the sample, then that 
element must fall m R" and the other elements must fall in the regions Z£i , 
22j, R 3 , and Ri in such a manner that 

ni + tt 2 = n a + n\ = n 

n\ + rii = ni + M3 = n 

or so that 


( 2 ) 


ni = n 3 and = Mi 


where n< is the number of elements in R, . The probability that this occurs is 


(3) 


£ 


(2n + l) 1 

71-11 2 n 2 ! 2 




Now suppose the median is determined by two different elements of the sample, 
for example one in R[ and one in R't, then there must be n, elements in Jib , 
ni + 1 elements in R a , and n 2 elements in each of R t and /£< with 


(4) Mi + ni = n - 1. 

The probability in this case is 


(5) 


/ t 

P 1 V 1 


- 2 — ■ 1 — pr'K’pr^'p?’ 


1 i+ivj— n—1 Ml! (m + 1) Ill'll 2 


Continuing in this manner we obtain the distribution of the median, and letting 
D(xi, xi) represent the density function giving this distribution we have 

mu 5h) dxydx, = V" 2 CpiPa)” 1 (vm ni 

Mi r 712 r 


( 6 ) 


+ (wlri + p,M> s (pm)-(p,p.r 

+ ( P ,p; p : + p,p',p i) 2 (p,P.)"(P,P.r- 


2. Asymptotic distribution of the median in two dimensions. As a simple 
notation 

ri = B(1 + 0(1/Vn)) 

will be abbreviated to read 

(7) ri = • B , 

the dot after the equality sign indicating the omission of the factor 1 T 0(1/\/n). 



DISTRIBUTION OF MEDIANS 


271 


As is customary, the second term of this factor represents any function such that 

lim NO(l/N) = L < oo. 

A7—*oo 

In order to get an approximation to (6) for large n we shall use the normal 
approximation for the multinomial distribution and compute the sums (these 
cannot be put in finite form) by integration. We use then the well-known result 

(8) n pr* = • exp ( - 4 E A^z) g dz<, 

TT i 1 \ i / i 

11 7H,! 

1 

where 

(9) Zi = (to, — mpO/V to , i = 1, 2, • • •, r — 1, 

(10) A„ = I + I, = 1. 

P. Pr Pr 

Returning to (6) it is to be noted that the fraction immediately following 2 
in the first sum has one more factor m the denominator than the corresponding 
fractions in the other sums This first sum may therefore be neglected in the 
asymptotic form as it is of order 1/n in comparison with the others We con¬ 
sider now the second sum in (6) and let it be represented by the letter S 


(2n 11 v';'vrvV*'vV- 


(11) a - 2„<2n + Urlrl ni+ E__ i 

Employing (8) and omitting certain terms of order 1/n we have 

(12) S = ■ 4n 2 p(p£ E [A/( 2ir) 3 ] i exp J E A„z,z,^ dzi dz 2 dz 3 , 


in which the A t , are defined by (10) with r = 4, and 

(13) z, = (n, — 2np I )/V'2rt ) i = 1, 2, 3. 


In view of the relations (2) between the n; we have 


(14) 


Zi = \/2n (1 - Pi — Pa) — Zi = Mi — Zi 
za — V2n (pi — pa) — Zi = u t — zi, 


in which relations we have defined the new symbols iii and u 2 . It will be recalled 
that in (8) the factors dz< correspond to factors 1 /s/m, we therefore let dz 2 and 
dza in (12) cancel a factor 2 n from the coefficient of the exponential, and after 
substituting (14) in (12) find that 


S = 


. 2 ww? » P {-} [«:(i + £ + ± + ±) 

( Ui + Ui „ hi 4 - hA + _|_ wii 

V pi P2 Ps/ P4 P2 p3J, 


+ 2zi 


dz i. 


(15) 



272 


A. M. MOOD 


The summation can now bo performed to within terms of order 1/^/n by inte¬ 
gration with respect to zi between the limits — oc and + ; this gives us 


S =• 


(16) 


2 np 
~~2 tt 


Ml 

P2 


^ W(A + A + A + AY exp {- * r+ 

7T / \p! P2 pj P</ ( L pl 

. hI _ ( V+ u * _ «i + H?V//I 4 - 1 4 . A + ANTI 

Pa \ V 1 Ps Pa/ / \Pi Ps Pa Pi/Jj' 

At this point some new symbols are required. We let q, and q[ represent the 
results of replacing xi and f 2 by zero in the integrals of the relations (1) 


(17) 


Qi= / /(■'El, *2) dxi dx t 
Jo Jo 

32 = f f f(xi,x 2 )dxidx 2 
J -00 Jo 

qa = [ [ f(x!, Xt) dx 1 dx a 

J — 00 J — 00 

= L L ^ xi ’^ Xi 


<li = f f(x 1 , 0 ) dx x 
Jo 

(Z2 = / /(0, Xj) dx 2 

J 0 

/ o 

/(xi, 0) dxi 
3i - / /(0, x 2 ) dx, 

J — 00 


then 



(18) 

ffi + qi — qa + q* = 

3i + q< — ffa + Qa 5 

and 



(19) 

11 

s 

11 

& 

Also we let 



(20) 

— 32 + 3a > 

— 3i + 3< 1 

(21) 

2/i = -\/2n ciiJi, 

P2 = y/2n a 2$3. 

We have now 

5 

II 

i = 1, 2, 3, 4, 

(22) 

p( = • 3,-dxa , 

i « 1, 3, 


Pi = ■ 3<d2i, 

i = 2, 4. 

Also 

Mi — -v/2n (^ 

— Pi — P») 


= \/2ra f [ 1 f(xi, Xi)dxidxt 
J—OQ JQ 



DISTRIBUTION OF MEDIANS 


273 


(23) 


Similarly 

(24) 


= V / 2n is / /(ii, 6Xi) dxi, 

00 

= ■ \/2n Is f fix 1 , 0) dii 

w—oo 

= • V^ft 0212 
= • Va. 

Ua = y/2n (pi - p 3 ) 

= • “(2/1 + Jfs). 


o < e < i, 


The result of substituting (22), (23) and (24) in (16) with some further simplifica¬ 
tion using (18) and (19) is 


(25) 


___ 2 nq'i q 


s =. 


2 ir\/ qi 


Iq'a ( 1 
= exp ( “ n 
qi qi \ z 


y\ - 4(gi - q^ViVa + y\ 

4^i qa 


) 


dx i dXi. 


The other three sums of (6) will give rise to the same expression except that the 
factors q[qa will be different; it is clear then that 

D(ii, ft) dx.dxa = • + +-?-»«*> 

2-irVqiqt 

X exp (-1 yl - 4(gl 7 ana, 

\ 2 4 q x qi J 

2naia3 . exp f-n aUl ~ + 

\ 4gig 2 / 


(26) 


(27) 


27T-\/ gi$2 


*5“ 4 <?‘ -*>»■ S’+J'lWdy,. 


2t\/ qi qi r \ 2 4gig 2 

This is the asymptotic form for the distribution of the median in two dimensions. 


3. Distribution of the median in k dimensions. We consider now a population 
characterized by a density function f(x i, • • • , i*,) defined over a euclidean space 
of k dimensions satisfying conditions like those required of fix\, x 2 ) in section 1, 
and we assume that the population median is at the origin so that the integral 
of the density function over any half-space determined by a coordinate hyper- 
plane is i?. 

A sample of 2n + 1 elements will have a median (ah, * ■ • , x k ) each coordinate 
of which is the middle number of the set of numbers giving the correspondmg 
coordinate of the elements of the sample. To obtain the probability that the 
sample median lies in the hyperparallopiped x a — \dx a < x a < x a + § dx a 
(a = 1, 2, ■ ■ ■ , k), we divide the space into 3* regions by means of hyperplanes 



274 


A. M. MOOD 


perpendicular to the coordinate axes through the points x„ ± \ dx a on the co¬ 
ordinate axes These regions are illustrated in Figure 2 for the case of three 
dimensions The coordinate axes have been omitted in this figure. There 
will be 2 l primary regions denoted by 7?i , lh , • , FV corresponding to the 
octants of the figure; K 1 ' 1 regions with one differential dimension denoted by 

R[ , Ri , • • * , Sisi-i corresponding to the quarter slabs of the ligure; ^ 2 ) ^ ' 

legions with two differential dimensions corresponding to the half strips of the 
figure, and so forth. Probabilities associated with these regions are defined by 

Pi 01 = f(x 1 , ■ - •, x L ) dxi .. ■ dxk . 



If the sample median is determined by fc different elements of the sample there 
will be one of these k elements in each of k regions H\ whose differential dimen¬ 
sions are mutually orthogonal and the other elements of the sample will fall in 
the regions Ii t in such a way that n elements of the sample will lie on either 
side of any of the k hyperplanes x a ~ x a . The probability of this occurrence 
for a particular choice of k of the regions Ri is 


( 28 ) 


ru-' 

ii <~i 


a — l 


in which the 2 k indices n, are subject to k independent restrictions of the type 


( 29 ) 


S'n, = n — c a , 



DISTRIBUTION OP MEDIANS 


275 


where c a is an integer such that 0 < c a < k, and the prime on 2 indicates that 
the sum is to be taken over all n, on one side of a hyperplane x a = x a . n, is 
the number of elements in R t and besides the h conditions (29) we have also 

2 * 

(30) Z fti = 2« — k -f- 1. 

i 

In order to include all ways in which the median is determined by k different 
elements of the sample we must add together 2 fc(fc-1> sums of the type (28). If 
the median is determined by less than k elements, say k — h elements, then the 
fraction (2 n + 1) !/IIn,' will have h extra factors m the denominator and hence 
the sum will be of order Ijn as compared with that of (28) and may be neglected 
in obtaining an asymptotic expression 

Thus we need only find the limiting form of (28) 

5 = (2«< + l)(2n) • . • (2» - k + 2) ft pi £ - n ’ II V">', 

i 11^,1 i 

which after substituting (8) and neglecting terms of lower order becomes 

(31) S = ■ (2 In)* XI pi Z U/( 2ir) 2 *^ l ) i exp (-§ Z Ai&z,) II dz ,, 
in which the Ai, are defined by (10) with r — 2 h and 

(32) z, = (m — 2np,)/\/2n, i = 1, 2, • • • , 2* — 1. 
Now we define 

(33) u a = y/2 n(£ - 2'p,), a = 1, 2, •.. , k, 

the 2' having the same significance as in (29). These conditions (29) may now 
be put in the form 


z« =’U a - L a (z ), 

in which L„(z) is a sum of a certain subset of the variables Zk+i , ■ ■ ■ , Z 2 *-i. 
Care must be taken in labeling the regions Ri in order to be able to solve for 
Zi, ■ • • , z* in this form. After substituting these relations in (31) we replace 

k 

II dz a by (l/2n)* /z and perform the summation to within terms of order 1 /s/n 
i 

by integrating the remaining z, from — oo to + °° ; the result is 

(34) S = -(2n/2ir) m II pi y/B exp ^ — \ Z B a pU a v,fiJ, 

in which the B a ? are functions of the p, , and B = | B a $ |. As in (17) and (20) 



276 


•V. M. MOOD 


we define 

Qi = [ fix i, , ft) IT dx a 

(35) </! = [ /(ft, * • • , ft)H'dft. 

a. « f /(ft, , ft)n'dft = 2'rp, 

J*„-o 

in which i5i is the set of regions bounded by the coordinate Uyperplancs fi[ 
are regions into which the coordinate hyporplanrs are divided by the remaining 
coordinate hyperplanes. IT' indicates that one of the differentials is omitted 
and the variate corresponding to that differential is put equal to zero in 
fix,, • ■ • , a:*); S' indicates the sum over all q' determined by regions lying in 
the hyperplane x a — 0. It is clear that 


Pi - ■q . 

iik,- n 


a — * "\/2i i 5os e,3 Xj, !//S i 


(36) 

it, 

where 

a„/i = dbl or 0, and y» — . 

Making these substitutions in (34) we have 

(37) S - ■ (2n/27r)* /2 II q <a s/C exp (^-nJZ C 0 | 1 a 0I a ( ift,ft^ IId5 a , 


and adding together all possible sums of the type (28) we have the asymptotic 
form of the distribution of the sample median 

(38) D{x i , •. ■ , x h ) XJ dx a 

exp 

(39) = • (1/2tt)' !, VS exp (-h'ZZ Capy„Vp) XX > 
in which the C a p arc functions of the q,. 


= . (2n/2ir) kli XI a a y/C 
1 


C Y yi n u $x die XX dx n 


4. The case of three dimensions. • The computation of the coefficients C a p 
of (39) requires the evaluation of a determinant of order 2 K — k for each one of 
them. This work was quite laborious even for k — 3 and the author made 
no attempt to find their explicit expression for larger values of k. 

If we let a subscript indicate integration of the density function 
fix i, Xi, xf) from 0 to «, and a subscript—indicate integration from — <*> to 0, 



DISTRIBUTION OF MEDIANS 


277 


as for example, 


/++-= / / / f(xi, xz, x 3 ) dxadxzdxi, 

Jo Jo J— CO 


then the g, of (35) will be defined as follows 

#1 = f-H-+ <Z5 ~ /—H- 

(40) “ /++ - 5 " “ *-+- 

?3 — /+—(- 27 = / f. 

Qi = /h— q& = /— 

The coefficients C„$ may be written 

DCu — 2(gt -f- g&) (?2 + go) 

DCM = 2(gi + g 3 )(g 2 + g4) 

DC 33 = 2(gi + g 2 )(g 3 + g 4 ) 

DCu = gags + g<ge — gig? — g 2 g« 

DCia = g 2 g& 4- ?4g7 — gigs — gags 

DC23 = gaga + owr* - ^g 4 - g 6 g 8 , 


(41) 


where 

D = gigagagsf- 4- — + —b —^ 4* gsgegagsf— H-1-1— 

\gi qi ga g</ \gs go g? g 8 / 

4- 2(gs + g«)(g7 4- gs)(gig2 4- gagd 

^ + 2(ge + qd (go + g»)(giga 4* gaga) 

4- 2(g 6 + gs)(ga + g?)(gig4 +■ gags) 

4- 8(g l g 4 geg7 4- gagsgegs) 

(41) and (42) can of course be put in different forms by using the four relations 
between the g,. The o a of (38) are defined in (35); for k = 3 they are 


(43) 


Oi = / / /(O, , X») dx 2 dxa 

J—tQ j— OC 

oa = / / f(xi, 0, %) dii dx 3 

J— 00 J—QO 

a a = f f f(x i, £ 2 , 0) dzi di 2 ■ 

J —<30 * 1—00 


I 



278 


A. M. MOOD 


5. The normal distribution in two dimensions, If the density function of 
the second section of the paper is normal 


(44) flxi, X}) - l/(2rtfi mV 1 - p-) exp 


1 h\ o i 


we find that the parameters of (26) are 



1,1 . -1 
?l = 4 + ^“ Pl 

i i. 

g , = r -am P 

1 

1 

ai=—~ 
ylt tri 

y— . 

V 27T cr 2 


These give an interesting result—the correlation coefficient of the asymptotic 
distribution of the sample medians is 



2 . -i 

Pa —sin P 

IT 


hence 

(47) |p*|£ M 

the equality sign holding only when p = 0 or ±1, 


REFERENCES 

[1] S. S Wicks, Statistical Inference, Ann Arbor, Edwards Bros, 1937. 

[2] A. T. Craig, "On the distributions of certain statistics," Amcr. Jour, Math,, Vol. 

54 (1932), pp. 353-396. 



SAMPLES FROM TWO BIVARIATE NORMAL POPULATIONS 1 

By Chung Tsi Hsu 
Columbia University 

1. Introduction, In multivariate analysis involving p variates, or in analysis 
of variance of m samples from univariate populations, we are often interested 
in the hypothesis of the equality of variances, viz., that 

<r\ = V 2 = • • • = ffp , in the case of p variates; 
or 

<ti = <Ti = • • • = <r n , in the case of m samples. 

As a matter of fact, it seldom occurs that these hypotheses are true, but the 
ratio between the variances might be known. 

Hotelling [5] has suggested that if 

a\jki — = • • • = cTm/k m = <r 2 , 

where the fc's are known constants, we can apply the transformation 

si = WiXi , 

an = w 2 x t , 


j 

where 

wy/ki = unVfcz = • • = WmVhn = 1, 
so that after transformation the variances become equal, l e., 

I / I 

ci — on — • • — c m , 

and the required analysis can be carried out. This method is similarly ap¬ 
plicable in the multivariate case. 

In a previous paper [7], I developed a series of hypotheses concerning samples 
from a bivariate normal population under the assumption that 

Ci = ff 2 . 

In case <s\jk i = <r\jk 2 , where fci and k 2 are two distinct known constants, 
similar results may be obtained by the use of the transformation %[ = w&i ; 
x 2 = v) 2 x 2 ; where Wx\fk x = w 2 \/k 2 = 1. 


1 Presented to the American Mathematical Society at Washington, D. C., May 3,1941. 

279 






280 


CHUNG TSI HSU 


In multivariate analysis, the hypotheses usually of interest concerning correla¬ 
tion coefficients may be classified m two categories, viz., 

(i) that the correlation coefficient is equal to a specified value, e.g., in 
simple correlation Pn — P a , in partial correlation, pn 3 — pa , in multiple 
correlation, p L . M = po, or m correlation between two seta of variates 
[4] 2 , Q = Qo ; of special interest is the hypothesis of the vanishing of 


a 


such correlation coefficients. 

(ii) that two given correlation coefficients are equal, e.g., ( 1 ) correlation 
coefficients pi and pa in the correlation matrix of a multivariate distribu¬ 
tion are equal (Hotelling [ 6 ]), or ( 2 ) the correlation coefficients p« and 
pis in two bivariate populations are equal. 

A, Fisher in his earlier paper [3] introduced the transformation z = 


1 


1 + r 


2 l0g l- 


which provides a very satisfactory, though approximate, method for 


the comparison of two correlation coefficients. Brandcr [ 1 ] treated the same 
problem by the method of the likelihood ratio criterion. 

The present paper is an attempt to obtain different criteria by the likelihood 
ratio method (Neyman and Pearson [9], [10], [11]) for testing, by means of 
samples, the equality of correlation coefficients in two bivariate normal popula¬ 
tions under the following sets of conditions: ( 1 ) <ti = cr 2 and 4 = 4 ; ( 2 ) <n = <r 2 , 
£l = £ 2 and 4 — 4 , {[ = £ 2 . The results may he extended to the eases (3) 
4/ki — at/hi and <?\/k\ — ] (4) 4/ki = o\jki , Zi/kt ~ f 2 /ik 2 and o\/k\ = 

<ri/k't , %i/k[ = & I k' 2 , where the k’a are known constants. 


2. The hypotheses. Two samples, each being of two variates (xj , z{) and 
(ii, 2 : 2)1 of size N and N 1 , are supposed to be drawn at random, respectively, 
from two independent normal bivariate populations, with the following distri¬ 
butions: 


. 27rcr l o- 2 -\/l — p 2 P { ^(l — p 4 ) Qi ) 

exp f K^r)' 


( 2 ) 


27r<ri 




where £1 , & , <ri , <r 2 , p; £ 1 , , <f 2 , <r 2 , p' are the unknown parameters of the 

populations. 


The hypotheses to be considered in the present paper are: 

H 1 : Assuming o-i = a and 4 — 4 , to test p = p'. 

Hi : Assuming m = <ns, £1 = &, and 4 = 4 , £[ = £ 2 , to test p = p'. 


1 See bibliography at the end of the paper. 



NORMAL POPULATIONS 


281 


The derivation and the distribution of the criteria for testing these hypotheses 
may be simplified by the following simultaneous transformations: 


(3) 

(4) 


x = - 4 = (*! - X ,) 

v 2 


Y — —— (si -T ^ 2 ) 

V2 


x' = ~ u - y' = 4-- & + *i) 

V2 v 2 


The corresponding normal bivariate distributions in the transformed variables 
(X, Y) and (X', Y') are obtained, viz. 


(5) 


2ircTx Gy*'/ 1 — 


exp 


1 

r/^-A 2 

to 

1— 1 

1 

A <rx ) 


( 6 ) 


2tt(Tx cr y 


uTrrs exp {- [&')’ 


The conditions corresponding to 

(7) <n = gi and g[ = ah , 
are that 

(8) Pxv ~ 0 and. Pxy = 0. 

Also, for a given p and p', we have from (7) 

(9) GY = 70 * and <r r = 7 g x , 
where 

do) 1 = !— “ nd 

l — p 1 — p 

Following the notation of (9) and (10), the hypotheses H[ and H 2 corresponding 
to Hi and Hi are: 

H[ : Assuming p X r = 0, and pxy = 0, to test y = y'. 

Hi : Assuming p X v = 0, ( = 0, and pxv = 0,£' = 0, to test y = y'. 


3. The derivation of the criteria. Let (x u , xu)(x[j, x r t] ) be the measurements 
of the characters on the zth and jth individuals in the two samples from their 
respective populations After transformation, the corresponding measurements 
become (Xi , F,) and (Xj , Y'i). Let p(E) denote the joint elementary proba- 



282 


CHTJNG TSI HSU 


bility law of the N and N' observations, E = (Xi Y l , - .. , Y N ; 

Following Neyman and Pearson, we shall use Si to designate the class of ad¬ 
missible populations under conditions which can be assumed to be satisfied m 
any case; and « to designate a subclass of 0 under conditions which are satisfied 
only if the hypothesis to be tested is true. 

Thus for W, ft specifies for p X y = pxr — 0, any real values of £, tj' and 
any positive values of a x , a Y , a x , <rr J « specifies pxr — pxr = 0 , any real 
values of £, r/, V and any positive values of o> and 7 which are defined by (9). 
While for H', ft specifies — Pxr = 0, £ = £' = 0, any real values of 77 and V 
and any positive values of <rx , cr r , tr* , try ; w specifies pxr = p. T :r = 0 , £ = £' = 0 , 
any real values of ij and r/, and any positive values of ay and 7 which are defined 
by (9). 

For our hypothesis H [, the values of the parameters required to make p(ft) 
a maximum are: 


£ — 3-x = sjc, 

<£ = 4, 


/ 1 \iy+x' 

Thus p(fl max) = ( - J - 1777 * 77 * 

\6TT/ SxSySx Sy 


Oy 


*' J 

ffy — 8y 


To obtain p(w max), let us define, according to the notation in the writer’s 
previous paper [7], 


and 


Bi 


2FsiSj 

8 ? + sl 


and 


Rl 


2 F'slai 

'2 _L_ '* 

Si + Sj 


8*y 1 + Rl 

U = -r = ---- 

si l-Rx 


u r = 


>2 

8 r 

75 

«x 


1 + 
1 - 


Then the values making p(w) a maximum are: 


I - 4 - Y, a$ - + «) 

I' - fl' = F', cv 3 - + «') 


and ^ is the positive root of the equation 

(N + JVO 7 2 — (N — N')(u — u')y — (N + N')uu 1 = 0 

or 


- = (N ~ jVQfr - V) + V(N - NQ’fr - uQ 2 + 4(N + iWW 

(11) 2(N + NO ~ 

= 71 , say- 



NORMAL POPULATIONS 


283 


Then. 


and the likelihood ratio criterion for the hypothesis H\ is 

^ _ p(o) max ) _ T 2\/yiSr TT 2\/^[s y ~| y ' 
p(fl max) L(7i + a)sxj |_(7i + -uOsxJ 


( 12 ) 




2a /yiU y 2a/ yiu' N> 
7i + «J LTi + ^'J 


For Hi , the values the parameters to make p(w) a maximum are: 


3= Y, 


<& = ^ SX 2 
N 


ffy - — Sy 




CTy - Sy . 


Thus 


/ ^ \ tf+N' 

p(fi max) = f — j 

Similarly, if we write 

2ysis 2 — ~ XiY 


A /NN‘ 


(sxy 'XxX'Y^s^s'/ 


e 


R ' = 2y ' s ' 1 ~ 2(^1 - -ra) 2 

2 s ( 2 + s^ + ^-^) 2 ’ 


i?2 — 


Sl + S3 + i(xi — ai 2 ) 


2 > 


and 


v = 


iVs? 


Sy 


2 X 2 s| + x 2 ' 1 - A’ 
the values to make p(oi) a maximum are: 


1 + R 2 v , = iVs| 2 = 1 + Ra 


2 X'“ 1 - Ri’ 


ft = Y, 4 = ^2X 2 (f+ ,) 
r = «r? = -^7 SX' 2 (f + i0 


(13) 


* _ (iV - 2V')(v - w') + V(JV - tf') 2 (« - f') 2 + 4(2V + 2V') W 
? 2(2V + 2V') 

. = 72 » say- 


p(ti) max) = 


/iyn 22 V A /72 IT 2 N' s/y^i T' 

\2jt/ L (72 + t')S^ 2 J L (72 + v')2X'*] ’ 


Then 



284 


CHUNG TSI HSU 


and the likelihood ratio criterion for the hypothesis H% is 

2 \/ Ny i Sy IT 2-\ZN'viSr 

/ 2X 2 J L( 


(14) 


_ p(co max) _ 
1 p(i2 max) 


(T2 + vW?X *J L(t 2 H- a')v / SX , d 


r 


_ pVw Tr aVwT 

L 72 + U _ _ 72 + ^ _ 


The case N ~ N', The above criteria Xi and Xi cannot in general be expressed 
simply, but when N = N\ by (11) and (13) 

7l = s/uu', 7« = "s/w', 

and 


4 ■%/wh' r 

\ _ r "i 

.(Vu + a/ u') 2 J 1 

Lev u + Vt/) 5 - 


or we may express as monotonic functions of Xi and X* , 



present case. 

Furthermore, if we introduce, 

(17) z = log u, and z' = £ log u', 
we have 

*(*-*')■= Hog ^ or 

U Y u 

Thus Li can be written in terms of z and z' 

(18) la = 4/(e ,(, -*' ) + ef 4 **-' 0 ) = l/cosh 2 £(z - z') = sech 2 J(* - *0, 

and z — z' — w, say, may be used also as a criterion for Hi . 

We shall now proceed to obtain the distributions of some of these statistics. 


4. The distributions of u/u ' and vjv'. 
pendently the x 2 distribution with 1 A r — 


Since Ns\/n% and iVsl/crl have inde- 
1 degrees of freedom, 


u = 


4 

si- 


2 2 2 

q>x 2 7X2 

2 2 2 
o'xXi Xl 


and u/y has the F distribution with degrees of freedom/i = N — 1, ft = N — 1. 



NORMAL POPULATIONS 


285 


Similarly, u'/y' = x^/xi has the F distribution with the same numbers of 
degrees of freedom (since N = N 1 , m the present case). 

If the hypothesis Hi is true (1 e., 7 = y') 


(19) 


2/2 n 

— — X2Xl M 2 _ 2l 

2 '2 D o' 9 f 

U XlX2 OlQi 22 


where 0 <( —Jx?) or d[ is distributed as 
( 20 ) 


with a; = UN — 1 ), and zi(— did 2 ), zi(= did'f) follow independently the Wilks’ 
z-distribution, [14], which we shall study in detail for the present case 
Distribution of z when p = 2 - Consider 

z = Bdidi • ■ * d v . 

Wilks has succeeded in integrating the distribution of z for the case p = 2 for 
special values of a’s, e,g, Oj. = \{N — 1 ), a 2 = UN — 2 ). Now we want the 
distribution of z when p = 2 and for any values of a, and then for a y = a? = 

UN - 1). 

By (20) the joint distribution of 0i and 02 is 


1 


0? l-1 e~ 9l 0r~V >a d0id0 s . 


r(a 1 )r(o i ) 

Applying the transformation z = Bd\di, Vi = 61 , the joint distribution of vi , z is 

1 .ai— 1 —hi ( % s \ 


r(oi)r(ai) 




2 1 -./Bvi dvidz 
& 


Bvi 


Integrating v y from vi = 0 to 14 = <*>, we have the distribution of 2 , viz., 


z at ~ l dz 


Jo 


(21) p°>r(a 1 )r(a 2 ) 

In order to evaluate the integral of (20), consider the transformation v y = y 2 , 
dvi = 2 y dy, we have 


( 22 ) 


-2 fy' 


2(ai—a a )—1 — 


dy 


To evaluate 7 0 for any a’s, by putting y — 1/x, dy = —dx/x 2 , we have 


(23) 

Consider 

(24) 


-*/ 


® — gx* IB— 1/x 3 

;j; 2 (ol—oj)+l 


dx. 


r(fli ~ °2 + |) _ f -* 2 v <* r 
x 2(.ai-ai)+l J & 




dy. 



286 


CHUNG TSI HSU 


Then 


jr 0 r(«x - a 2 + i) = 2 [ e - ilx2,B+llxt) dx f e^'y' 
Jo Jo 

= 2 f 2 r~ a '-'dy r e ~ l( * 

Jo Jo 


ai-4 


dy 




dx 


-v;f 

Jo 


-—2\/‘/ n +V., a l~ai—i _ ( }y 


V z/H + y 

Since by the substitution ^ + y = + y or y — x 2 -f 2 /|/|j £) dy = 

2 (* + Vs) dx and therefore 

7or(oi - a 2 + i) = 2 \/ir jf e ~^^ x) (x* + 2 x yi/jdx, 


(25) 


lo = 


Vrfi’ 


,-2a/w3 


r(ai — aj + i 
Hence, z is distributed as 




a,—aj—1 


dx. 


( 26 ) 


2 v ^z aa “ 1 e“ 2v '^ 


£ ai r(ai)r(a 2 )r(ai — a 2 -(- -£) 




We infer from this distribution that when 2(ai — Os), i.e., the difference of 
degrees of freedom, is odd, the integral can be expressed as a terminated series; 
but for even values of 2 (ai — on), the series is infinite. 

When B = ~,ai = i(N — 1 ), oj = i(W — 2), (26) is reduced to 

A 

( 27 ) VtA-V’-V^ 

r(ai)r(aj) 

which is Wilks’ £ distribution, [15], for p = 2 , 

When H - 1 and ai = <h = i(JV — 1), it becomes 

(28) f" e~ i *( 2 \/z + x)“'lx“* dx, 

r(a,)r (a 2 ) Jo 

which is the distribution of z involved in (19). 

Since (28) can apparently not be simplified, I have been unable thus far to 
find in manageable form the distribution of the ratio zi/zi and therefore of u/u' 
in this case. However, it would be simpler to use the alternative criterion 
w = z — z' for the hypothesis Hi. The distribution of w will be taken up in a 
later section. 



NORMAL POPULATIONS 


287 


The distribution of v/v': Since Ns\/a% and 'ETC/<j\ have independently the x 
distribution with N — 1 and N degrees of freedom respectively, therefore, 


v = 


NS\ 

2 X 2 


-/■ 

7 / 

N. 

Similarly ~ j - 


2 2 
ffrXi 
2 2 
<rxXi 


2 

7X2 
2 ' > 
Xi 


f jy _ 1 

and - j —^— has the F-distribution with f\ — N — 1 degrees of freedom and 

/« - N. 

' jy — 1 

has the F-distribution with degrees of freedom /1 and / 2 


7 ' / N 

as above. 

If the hypothesis H 2 is true (i.e , 7 = y'), 


2 n 
X 2 Xi 
2 '2 
XlX 2 


6l6i 


z_l 

J 

Z 2 


where each 0 , is distributed as in (19), but with ai = and 02 = §(N — 1). 
We can infer from (27) that <1 = 4-\/Zj and ts = 4 \/z 2 have independently 
the x 2 -distribution each with 402 or 2 (N — 1) degrees of freedom, and U/h — 
■s/zijz?. = V57? follows the F-distribution with degrees of freedom /1 = / 2 = 
2(1V — 1). The 5% and 1 % points of the F = n/V may be obtained from 
Snedecor’s table ([12], p. 174). 


5. The distribution of y — log z. Wald [13] has suggested that the distribu¬ 
tion of z — F 0102 ■ ■ A,, for any a x ’s (i = 1 , •• • , p) may also be obtained in¬ 

directly with the aid of the characteristic function A similar method has been 
applied in a recent paper by Wald and Brookner [14] Consider the trans¬ 
formation 


(29) y = log t = log B9102 ■ • • dp. 

The characteristic function of y is 

<p v (t) = E(e ,v ) = F{(S 0 1 02 ••• e T ) 1 ) 

(30) = B‘T(ai -f Qr(a» + t) • ■ ■ r(a P + t) 

r(oi)r(a 2 ) • • • r(fip) 

Thus the distribution f(y) dy is given by 

< 31 > M - St jC * - St C B ‘‘~“ 6 tst* 

Without loss of generality, we may take «i ^ 02 ^ • ■ • k > 0 and let 
a p t = — £', then 

A— j) 

(32) /(y) - / . e v ‘'B~ t ' II r(* - o, - l') di', 

ZfTT'i J— a_— too 


where c„ = e 0 ”*' B 


~ ap j n r(a t ). 

/ 1—1 



288 


CHUNG TSI H8U 


The integration can. be carried out by the method of residue along the contour 
C, bounded by the line x = —a,, and that part of the circle with center at 
origin and radius r, which lies to the right of the line x = — a v . The integral 
of the function e l ‘ v B' n?-i r(a, — a,, — 0 along the arc converges to zero 
as the radius of the circle tends to infinity (Kullback, [ 8 ]). Hence the integrals 
along the vertical line x + a„ = 0 and along the closed eontour C are equal, 
Then we may write 

(33) f(y) = f II r(flj - a, - V) dt', 

2m Jc i~i 

and its value is c p times the sum of the residues at the poles within the con¬ 
tour C. 

For the present purpose, p = 2, we have 

(34) f{y) = £-. e w 'r( ai - a, - l')r(-J') dl\ 

We shall study the integral of (34) in more detail in the following cases: 

(i) — 02 = 4- By the duplication formula 

r(4 - 0r(~0 - 2 l+ *‘V*r(-20, 


and the function 


r(-2f') = lira 


N\N^“' 


i—2t')(—2t r + 1) ... (-2 1 1 +N)’ 

has simple poles at the points 0, 4, 1, 3/2, .... The residue at t' = m/2, 
where m is zero or a positive integer, is (— l) m+1 /2>m! and (34) becomes 


(35) 


f(y) = \^c 2 (l - 2e iv + i2 2 e v - ^2V" 2 + ...) 


= VW 




The distribution of z = e v is 


(27 bis) 


2Vrs at ~ 1 e~ iVJ 


dz . 


r(ai)r(a a ) 

(ii) oi — at = m + The function 
r(ffl! - oj - <')r(-0 = (m - 4 - f')(m - 4 - 0 


» - W4 - 0r(-0 

= 2 l+v> \/Trim - 4 - t’){m - 4 - O ... (4 — Or(—20 



NORMAL POPULATIONS 


289 


has simple poles at 0, m, m + £, m + 1, • • ■ , and 


f(y) = V tv c 2 


(2m — 1)! 


[2 — 1)1 2 m (2 to ) 


(2V) M + 


(2V) m4 * 


/- r ( 2 w 


(2m - 1)1 


1 00 

_Iy 

OrH ' 


2 m (2m + 1 ) 

2^TOi (2V) " +1+ -'-] 

( 2 2 e“) m+1,fl J. 


(w — 1)! 2* Y «o (2m + 7)7! 

This agrees with the expansion of (26) when we put a x — at — | = m. 
(iii) ai — aj = 0. The function 


[r(—<0] 2 - lim 


(WAT 




(-t'yi-t 1 + i) 2 ... (-«' + N¥’ 

has poles of the second order at the points 0, 1, 2, 3, ■ •. and 


f(v) {(i' - y)V'nr(-{')]V7 

7-0 at 


(iv) fli — 02 = m. The function 
r(m - t')r(-l') = (m - 1 - t')(m - 2 - f) . .. (1 - 

) 

has finite simple poles at 1, 2, • • • ,m — 1 and poles of the second order at m, 
m + 1, • • • , and 

f(y) = Cj £ [(f - y)e l ' v T(m - t')T(-l')}r- r 
7 -0 

+ c 2 E W - Y)V'*T(m - or(-io) • 

7—m (Clt ) ('—7 


6. The distribution of w = z — 2 ' or ^ = cosh Since the distribution of u 
is given in [7] as 

<*» an o + * 

therefore, by transformation (17), we have that the distribution of 2 for a given 
f = i log y = J log — is 

1 — P 

sech” (z — f) dz, 



(40) 



290 


CHUNG TSX HSU 


where n = N — 1. The distribution of z has been given by It, A. Fisher [3] 
for n = 1 and by Delury [21. Similarly, the distribution of z' for a given iB 


(41) 


B 


(\ iA sech " ^ ^ dz '' 

\2’j) 


where n 1 ~ N' — 1. 

In case n = n', the joint distribution of z and z' for a given common f is 

Cdzdz' 


(42) C sech" (z — f) sech” (z' — f) dz dz’ — 


where 1/C 


-[ 4 DI 


cosh" (z — f) cosh" (s' — £)’ 


By the transformation z = + z'), w — z — z 1 , we have the joint distri¬ 

bution of 2 and w, 


C dzdw 


2” C dz dw 


/ A Q\ 

[cosh" (z — /) cosh" (z' — f)] [cosh 2(z ~ f) + cosh w] n " 
Integrating with respect to z from — » to we have 

dz 


2 n Cdw 


L 


(44) 


[cosh 2(2 — f) + cosh w] n 

= 2 n Cdw 


r 


2d2 


o [cosh 2(z - 0 + cosh w] n 
= 2 n C dwlr. , say. 

Applying the transformation tf> = 2(z — f), \p = cosh w, the integral of (34) 
becomes 

d0 


" Jo (cosh qfr + ^)"' 

_ 1 + f 
e 

. _ r 1 f e \* 1 
" i \i + */ e 


Substituting cosh <£ + f , we have 


dd 


(45) 




d9. 


Comparing (35) with the hypergeometrio function 

(46) I ~ J a db ~ 1(l ~ »)"(! - «*)"*<» = - b) ^)~ ~ F(a ' b - c ’ x) ’ 



NORMAL POPULATIONS 


291 


we have b = n, c — b = 4, a = 4, and therefore (35) can be expressed in terms 
of a hypergeometric series as 


(47) 


L = 


r(n)r(}) i F (, 
r(n + 4) (* + i)" V*' 


», » + 4, 


l - A 

* + !/' 


The series (37) is convergent since is less than unity. Thus the distri¬ 


bution of w, from (34), is 

(48) 2 " CT (n)r(*) 


r(n + 4) (coshiu + l) 
and the distribution of ^ = cosh w is 
2" +1 CT(n)r© 1 


* + 1 




n + 4, 


cosh w — 


cosh w + 


-j} dw, 


(49) 


r(n + 4 ) (* + i)-+»(* - l) 1 


1 n, n + 4 , ^ ^ d\p. 


We notice that the distribution of \p expressed in (39) is very similar to the 
r-distribution expressed in terms of hypergeometric series, except that in the 

first case the argument is — ; —-, while in the second case it is -— ; —- where 

i// + 1 1 + V 

p = pr. Hotelling [5] has obtained a very rapidly convergent hypergeometric 

series for the distribution of the correlation coefficient since | p | < 1. But 

for the distribution of \p, we cannot obtain a more rapidly convergent series than 

(39), since the values of ^ lie between 1 and «>. 


7. Summary and remark. Two hypotheses concerning the comparison of 
correlation coefficients of two samples from bivariate normal populations have 
been considered. The appropriate test criteria for each hypothesis have been 
derived by the use of a transformation of the variates. The distributions of 
certain of the criteria have been obtained in the special case where N = N'. 
Incidentally the distribution of Wilks’ z for p = 2 and any values of Oi and aa 
has been derived. 

Again though we assume throughout the paper that <ri = <h and <n — <r 2 , the 
tests can be generalized to fit the case where the ratios <ri/cn = k, a[/c<i = k' 
are known, but are different from unity. In the latter case we can apply the 
transformation 

2/i = w&i , y t = vhxi ; 

/ / > / r i 

2/1 = 101*1 , 2/2 = 102*2 ; 

where 

Wiki = wjct — 1 , w[k'i — Wiki — 1 , 

so that after transformation the variances of each pair of y 1 s are equal. 

The writer is deeply indebted to Professor Harold Hotelling and Dr. Abraham 
Wald for their advice and suggestions in the preparation of this paper. 



292 


CHUNG TSI HBTJ 


REFERENCES 

[1] F A. Brander, Biometrika, Vol 35 -(1933), p 102. 

[2] D. B. Delphi 1 , Annals of Math. Stat., Vol 9 (1938), p. 145. 

[3] R. A. Fisher, Mclron, Vol. 1 (1921), p. 3. 

[4] H. Hotelling, Biometrika, Vol. 26 (1936), p. 321. 

[6] H. Hotelling, Loctures delivered at Columbia University (1940-41). 

[6] H. Hotelling, Annals of Math , Stal., Vol, 11 (1940), p. 271, 

[7] C. T. Hsu, Annals of Math. Slat., Vol. 11 (1940), p. 410. 

[8] S. Rollback, Annals of Math. Stal,, Vol, 5 (1934), p, 263. 

[9] J Neyman and E S. Pearson, Biomelrika, Vol. 20 (1928), p. 175. 

[10] J. Neyman and E. S. Pearson, Bull. Acad. Polonaise Sci. Letlres A (1931), p. 460, 

[11] E. S. Pearson and J. Neyman, Bull. Acad. Polonaise Sci. Leltres A (1930), p 73. 

[12] G, W. Snedecor, Statistical Methods, Collegiate Press, Ames, Iowa, 1937. 

[13] A. Wald, Lectures delivered at Columbia University (1939-40). 

[14] A. Wald and R. J. Brookner, Annals of Math. Slat , Vol, 12 (1941), p. 137, 

]15] S, S. Wilks, Biomelrika, Vol. 24 (1032), p. 471. 



ON RANDOMNESS IN ORDERED SEQUENCES 

By L. C. Young 

Westmghouse Electric and Manufacturing Company 

It is frequently desirable to examine an ordered sequence of measurements 
for the presence of non-random variability, concern over any particular type of 
variability being limited Unless the sequence is one containing replicated 
observations, current methods of analysis often restrict an investigation to 
tests for specific forms of variability, such as particular orders of regression and 
periodicity. In order to simulate replication, arbitrary grouping of data is 
occasionally used and followed by some test of variance; this practice, however, 
is likely to add an element of bias to the investigation 

Under these conditions, it would be convenient to have the means of testing a 
series for the presence of general regicssion, before proceeding to test for that of 
a specific type. It is the purpose of this paper to present, as briefly as possible, 
a statistic designed for this preliminary type of examination, and to demonstrate 
its application. 

If a given sequence of measurements be denoted by 

A-i j Us, • • ■ | Up 

then the magnitude of 

£ (X, - Z , +1 ) 2 
<7-1 - -hr-, 

' 2 £ (Y, - X? 

1 

will be dependent upon the arrangement of the n observations upon which it is 
based. C will have n! possible values for a given sample, corresponding to the 
number of permutations of n items. 


1. Moments of the distribution of C in terms of the moments of a 
finite sequence. Writing C in terms Xi, • • • ,x n , representing the devia¬ 
tions of Xi, , X„ from their sample mean of n measurements, 

£ (*• - z<+i ) 2 
<7-1- J ;- 

2 r. x? 


x\ 


n—l 

+ rf + 2 E z.si+i 
1 


2 Erf , 


293 



294 


L, C. YOUNG 


In order to find the mean value of C for a given sample, it must be summed 
over all values obtained from the n\ permutations of the measurements. 
Dealing with the numerator alone of the expression given above: 


2 , 2 
Xi [ 


n—l 


+ 2E 


'i 3-14-1 


o-l 

; x i “I - y 'j 'nXn i 2 y .-i y ] r,, 


where S p denotes summation over the n\ permutations. 

There are n values of , and n I arrangements. Each value X( is xi in 
(n — 1)1 of the arrangements: the same reasoning applies to x„ . The first two 
terms of the summation, therefore, will be 

= 22j>Zn = (n - 1)! 

i 

With regard to the third term, there are 2 (n — 1) of such cross-products for 
each arrangement. Since the summation is taken over n\ arrangements, x,:r h 
will be different than XkX ,, and should be considered a separate term. Each 

ciossproduct term, therefore, must occur ti me9 throughout the nl 

arrangements, since there are n(n — 1) possible cross-products among n different 
items The third term, then, will be 

2 1], ^22 a\x,+t^ = 2(n - 1) 1 £ 22 x i x k — ~ 2(n — 1) I 22 s?, 


from which it may be seen that the mean value of C is zero for any sample. 

The same method may be applied in order to find the second and higher 
moments of C. Squaring the numerator of the expression and expanding, 



xl + xl+2 



= E* 


xi 


fi-1 * n—l 

+ + 2xlxl + 4 xl 22 XiXx+l + 4Xfl 22 ®« - £»+l + 

1 1 


/n-1 \i~ 

4f ) j X{Xij-iJ . 


Performing the summation S p term by term we obtain 


12, 


~ n—l “12 / n \ 2 n 

X 2 1 + xl + 2 22 £<£i+i 2(2 n — 3)( 22 x lj — 2n 22 x * 

«! n(n — 1) 

whence the second moment of C for any sample is given by 


AT, 


2n — 3 — iru/ml 
2 n(n — 1) ’ 


where and are the second and fourth moments, respectively, of the n 
observations about their mean, 

In like manner, the third and fourth moments of the distribution of C for a 
given sample of n observations are found to be 


RANDOMNESS IN ORDERED SEQUENCES 


295, 


Mi = 


Mi = 


6 + 4(n - 3) ^( + 9 l - 3 \ 

m<i m?. mu 


4 n(n — 1 )(n — 2) 


8 (ft — 1 )(to — 2)(n 


- 3) L 


24to (to - 3) 2 - 48to(4to - 9) — 

mi 


- 24n(3n - 17 to -f 27) ~ + (8n a - 45 to 2 - 23n + 210) 

to 2 

+ 16 (2n 2 + 5n - 21) ^- 8 + 4(17 to 2 - 37to + 42) —* 

m 2 ml 


- (7« 2 + 13n - 6) f|. 

m 2 J 


2. Distribution of C for samples drawn from a normal universe. The 

first four moments of the distribution of C for samples drawn from a given popu¬ 
lation may be derived from the above formulae by substituting the mean values 
2 

of ^4, —|, etc. of samples from such a population. For normal samples con- 

ml mi 

tabling n observations, for example, the following mean values apply, as obtained 
by the method presented by R. A. Fisher [1, 2]: 

ml __ 6 (n — 2) 

ml (to + l)(n + 3) ’ 

to 4 _ 3(to — 1) 

ml (to + 1) ’ 

m !I = 3(3to 8 + 23a 2 - 63n + 45) 
ml (to + 1)(to + 3)(« + 5) 

mim s _ 60 (to — 1)(to — 2) 

~ (n + l)(» + 3)(» + 5)' 

m fl _ 15(71 — l) 2 

TO2 (to + 1)(to + 3) ’ 

TOg = 105 (to - l) 8 

7«2 (TO + 1)(to + 3) (to + 5) 

Replacement of the sample moment ratios by the mean values of those ratios 
for normal samples yields the following moments of C : 


Mi = 0, Mi = 


to — 2 

(to — 1)(to + 1) ’ 


Ms ~ , 0 , 


Mi = 


3 (to 2 + 2to - 12) 

(to - 1)(to+ l)(n + 3)(n+ 5)' 



296 


L, C. YOUNG 


Compatible results for the case of normal samples have been obtained by 
Williams [3], using another method. 

From the above results, the value of 

B _ 3(n 2 + 2n - 12)(n - l)(n + 1) 

P2 («-2)*(n + 3)(n-f-6) 

is seen to approach normality as the sample .size is increased. 

Inasmuch as the distribution of C for normal samples is limited in both direc¬ 
tions and is symmetrical, it is apparent that the Pearson Type II distribution 
may be considered representative. Fitting this curve to the moments given 
above, the equation of the frequency distribution is given by 



where 


„ _ (rc 4 - n 3 - 13n 2 + 37 n ~ 60) 
m 2 (n= - 13» + 24) ’ 

2 _ (n 2 + 2n — 12) (n — 2) 

° ~ (n 3 - 13» -f 24) ’ 

= P(2w + 2) 

ya a- 2 2m+1 [r(m + l)] 2 ' 

The values of /3j for the distribution, for various values of n, are as follows: 


Sample eir.e, n 

Pi 

5 

2.300 

10 

2.570 

15 

2.684 

20 

2.750 

25 

2.793 

50 

2,833 


Due to the effect of even moments higher than the fourth, the approximation 
afforded by the Type II curve is not reliable for samples containing less than 
about eight observations. As the sample size decreases below this limit, the 
extremes of the C distribution deviate increasingly from the extremes (±<z) 
of the fitted curve: with such a platykurtic distribution, therefore, the effect 
upon the lower significance levels vitiates the approximation 

Although either & or the theoretical limits of the distribution of C could 
have been employed as a parameter of the fitted curve, it was considered ex¬ 
pedient to use the former. In any case, of course, the advantage to be gained 
would be in connection only with samples containing few observations (less 
than eight). The evidence afforded by empirical sampling indicates that use 
of the limits as a parameter might render the approximation less valid. 



RANDOMNESS IN ORDERED SEQUENCES 


297 


In order to facilitate use of the approximate distribution for samples of eight 
or more observations, the values of C associated with two probability levels are 
tabulated below in Table I. The ratio of each value of C to its standard error 
is also shown, to demonstrate the approach to normality. The significance 
levels recorded exclude 10% and 2% of the area under the curve, respectively. 
In most practical applications, these will be the 5% and 1% levels, respectively, 
since only positive values of C exceeding the tabulated value will ordinarily be 
considered significant. The tabulations were prepared from tables of the 
function hip, q) [5], where q = .5 and p = m 4- 1, with the transformation 



TABLE I 


Significance levels of the absolute value of C 


Sample size, ft 

P = 10 

C 10 / 

P = .02 

C.ot/v* 

8 

.5088 

1 6486 

.6686 

2.1664 

9 

.4878 

1 6492 

.6456 

2.1826 

10 

.4689 

1.6494 

.6242 

2.1958 

11 

.4517 

1.6495 

.6044 

2.2068 

12 

.4362 

1.6495 

.5860 

2.2161 

13 

.4221 

16495 

.5691 

2.2241 

14 

.4092 

1 6494 

.5534 

2.2310 

15 

.3973 

1.6493 

.5389 

2.2369 

16 

.3864 

1.6492 

.5254 

2.2423 

17 

.3764 

1.6492 

.5128 

2.2470 

18 

3670 

1.6491 

.5011 

2.2513 

19 

.3583 

1 6489 

.4900 

2.2550 

20 

.3502 

1.6488 

.4797 

2.2585 

21 

.3426 

1.6488 

.4700 

2.2616 

22 

3355 

1.6486 

.4609 

2.2647 

23 

.3288 

1.6485 

.4521 

2.2676 

24 

.3224 

1.6484 

.4440 

2.2700 

25 

.3165 

1.6484 

.4361 

2.2717 

Normal (n = 

co) 

1.6447 


2.3262 


The distribution of C for normal samples containing 20 or more observations 
is sufficiently normal, for most practical cases and for the more common signifi¬ 
cance levels, to permit use of a table of areas under the normal curve, in conjunc¬ 


tion with the standard error tr a 




n — 2 


The 5% significance levels 


. . lXn+1)' 

shown m Table I result, at worst, in a one per cent error of probability estimate, 
if the normal approximation is used in their place' that is, if 1.6447 times the 
standard error is used instead of the tabulated significance level, the probability 
will be .0505 at most, for the values of n which are tabulated. 



298 


L. C. YOUNG 


3. General discussion on the application of C. It may be wondered 
why the statistic C has been used, "rather than the more easily computed statistic 

E (2Ti - X+0 2 

C" = ——-, As far as a significance test is concerned, it clearly 

£*? 

i 

does not matter which is used, since C and C' are linearly related. However, C 
may be regarded as symmetrically distributed about 0 in samples from a normal 
population to within at least four moments. Excessive departure of C from 0 
may be taken as indicative of the presence of non-randomness in the series, the 
actual significance test being based, of course, on the probability of obtaining a 
departure larger than a given observed one, under the assumption of a random 
series. Positive values of C, in general, correspond to positive correlation while 
negative values correspond to negative correlation between successive obser¬ 
vations. 

There are various ways of detecting non-randomness in a series of observations, 
such as regression methods, analysis of variance, etc. The, use of regression 
methods implies that we must know in general the type of iegression function 
to be tried. C is a very flexible statistic, on the other hand, for testing the null 
hypothesis that a series is random, no matter what the alternative hypothesis is 
A thorough study of C as a statistic for testing the hypothesis of randomness in 
an ordered series should include a study of the power function of C for hypotheses 
specifying various types of non-randomness. However, we shall simply appeal 
to intuition in proposing the statistic C, and forego power function considerations 
in this note. In practice, the advantage of using C increases with the length of a 
series: lack of randomness in a single sequence of ten or less observations may 
ordinarily be detected by regression methods, in fitting a low order polynomial. 
In a longer sequence of measurements, on the other hand, the presence of com¬ 
plicated regression or of periodicity is often sufficiently obscured by variation 
to elude detection by any other than a flexible method. 

The statistic could be used to advantage in the field of applied statistics, in 
the investigation not only of variate series but of attribute series as well. For 
the latter purpose, an effort to tabulate the relationship between the level of 
significance and the percentage of either attribute would facilitate statistical 
investigation of random arrangement. A direct application could thus be made 
to binomially distributed attributes by a scalar assignment (0, 1) to the dichot¬ 
omy, followed by a procedure similar to that presented above, Similarly, the 
randomness of vectorial observations could be examined from the viewpoint of 
arrangement. The common method of treating such problems,—the “random 
walk method,”—has occasionally been found inadequate in dealing with specific 
forms of non-random order; this is especially true when the allocable cause of 
variation has a multi-directional effect. 

Heedless to say, each of the fields of application considered so briefly above 
would require development before a routine, efficient method of investigating 
ordered arrangement could be established. Although probability level tables 



RANDOMNESS IN ORDERED SEQUENCES 


299 


have been provided in this paper for C as applied to normal samples, it is quite 
evident that tables for samples from other parent distributions would be needed 
for some of the applications mentioned above. 


4. An illustration of the use of C. Although one example has already 
been piesentcd elsewhere [4] m which the distribution developed in Section 2 
has been employed, a typical application of the statistic to an example in the 
field of quality control will bo given here in order to illustrate the mechanics of 
solution. The data presented in Table II represent the percentages of defective 
product turned out daily, over a period of twenty-four days, by a single workman. 
The total output each day closely approximates five hundred parts, this fact is 
brought out to explain the calculation of x 2 for the observed series of percentages, 
—it has no bearing upon the use of C. 


TABLE II 


C = 


Percentage of product rejected 


Day 

%,X 

X 2 

tP 

1 

7.4 

54.76 


2 

8.8 

77.44 

1.96 

3 

11.4 

129.96 

6.76 

4 

10.3 

106.09 

1.21 

5 

11.9 

141.61 

2.56 

6 

12.2 

148.84 

.09 

7 

10.0 

100.00 

4.84 

8 

8.4 

70.56 

2.56 

9 

9.4 

88 36 

1.00 

10 

10.9 

118.81 

2.25 

11 

9.9 

98 01 

1.00 

12 

11.8 

139.24 

3.61 

13 

10.0 

100.00 

3 24 

14 

8.9 

79.21 

1.21 

15 

9.7 

94 09 

.64 

16 

9.3 

86.49 

.16 

17 

12.0 

144.00 

7.29 

18 

12.3 

151.29 

.09 

19 

10.3 

106.09 

4.00 

20 

8.0 

73.90 

2.89 

21 

10.4 

108.16 

3.24 

22 

11.1 

123.21 

.49 

23 

9.4 

88.38 

2.89 

24 

8.2 

67.24 

1.44 

Totals 

242.6 

2495.82 

55.42 



nX 1 2452.28 




Sx s = 43.54 



.3636 (significant) x = 21.618 (23 degrees of freedom) (not significant). 



300 


L. C. YOUNG 


The value of C derived from the data lies between the two significance levels 
tabulated in Table I; there is reason to believe that the data are ordered, or non- 
random. Computation of x, however, has been carried out with the hypothesis 
that all product was made under the same conditions (i.e, with a percentage 
defective equal to 10.108%, the mean of the group). The value so obtained is 
associated with a probability of about P = .50: the hypothesis is not disproved 
by this test. In short, the variability of the twenty-four observations could be 
considered random if it were not for the order of their arrangement, 

REFERENCES 

fl] R, A. Fisiieh, “Moments and product momentB of sampling distributions,” Land. 

Math. Soc. Proc. (aeries 2) 30 (1920), pp. 199-238. 

[2] R, A. Fisher, “Tho moments of the distribution for normal samples of measures of 
departure from normality,” Roy. Soc. Proc , A 130 (1930), pp. 16-28 
[3J J D. Williams, “Moments of tho ratio of tho mean Bquare successive difference m 
samples from a normal universe," Annals of Math. Stat., Vol. 12 (1041), pp. 
239-241. 

[ 4 ] L. C. Yoijno, “A critical appraisal of statistical methods in industrial management,” 
presented at the annual mooting, American Society of Mechanical Engineers 
(1940) 

[6] K. Pearson (Editor), Tables of the Incomplete Beta-Function , Biomctrikn Office, London, 
1924. 



ON CERTAIN LIKELIHOOD-RATIO TESTS ASSOCIATED WITH THE 
EXPONENTIAL DISTRIBUTION 

By Edwakd Paulson 
Washington, D.C. 


Various likelihood-ratio tests and their distributions in samples from a popula¬ 
tion having the elementary probability law B < x < have been 

<7 

studied by Neyman and Pearson [1] and Sukhatme [2]. In this note the power 
functions and the question of bias of several likelihood-ratio tests will be in¬ 
vestigated. The exponential distribution appears to be appropriate for dealing 
with problems involving the intervals of time between events which tend to be 
random, as for example the interval between consecutive telephone calls, or 
the interval between consecutive accidents to the same worker. 

To test the hypothesis H' that the location parameter B is equal to some 
fixed value, it being assumed that the scale parameter a is known, we can for 
simplicity take the set f 1 of admissible populations from which the sample might 
have been drawn to be (— «> < B < + «, <r = 1}, while the subset u from 
which the sample must come when the hypothesis is true is {B = 0, <r = 1}. 
Then the likelihood-ratio V for testing this hypothesis is 

Tt 

- S ar, 

P (w max.) e »-i _ _ nil 

Ai — :—--r — t:- ~ C 


P(n max.) 


» - > 
- S (*<—*i) 

C 


where xi is the smallest observation in a random sample of n. The region of 
acceptance of this hypothesis consists of all points in sample space for which 

Xu < Xi < 1, 

where Xi, is chosen so that / fli(Ai) c/Xi = 1 — a, a being the level of significance 

•'m. 

used and g (Ai) d\i being the distribution of Xi when B is really equal to zero. 
The region X u < Xi < 1 is equivalent to the region in the sample space for which 

0 < x x < h ; fci = - . 

— n 

For any value of B the distribution of xt is known [3] to be 

</»i(*i) d.'ti = nd r,! *'" 8i dxi 
301 



302 


EDWARD PAULSON 


Setting S = 0, the relationship between k\ and a is 

c* i 


f 

Jb 


nc 


dxi = 1 — of, 


so 


-n*i 


When B < 0, the power function P{B), for this test is 


When 


J rK. 

1 

o 

0 < B < h, P(B) = 1 - f 


- nix '- B) dx t = 1 - e" fl [l - a]. 


ne 


-n(xi— B) 


dx i = ae na . When B > hi, 


P(B) = 1. 

Since «"* > 1 if B > 0 and also e nB < 1 if £ < 0, P(B) is obviousl 3 r > a if 
B 7 ^ 0. This test is therefore completely unbiased in the sense of Daly [4], 
In addition, it is not difficult to prove that this test has the unusual property 
of being a uniformly most powerful test with respect to all alternatives. 

To test the hypothesis H" that the location parameter is equal to some fixed 
value, say B = 0, when the scale parameter a is unknown, the likelihood-ratio 
is easily seen to be 


As — 


52 (*< - Zl) 

■'-i __ 

n 

52 

t-L 


l + 


nxi 


52 (z< ~ £i) 

(-i 


The region of acceptance consists of all points in the sample space for which 
Aj, < A 2 < 1 where / ^(Aj) dA s = 1 — a. This is equivalent to the region 


( 1 ) 


o < 


‘ n(n — l)^ 


= t 


< k 2 ; ki = (n — 1) 


(i - 




The relation between hi and a is easily found from the distribution of l when 
5 = 0, which is known to be [3] 

<fn(t) dl = = -—. 

L 1 + rh] 

Therefore f dt = 1 — a, so | 1 -|- 1 

Jo L * — IJ 

It is somewhat easier to find the power function of this test by considering the 
region of acceptance as made up of points in the Xi, s plane for which 


(rt-l) 


= a. 


fcjS 


52 (zi - xi) 


0 < xi < where s — —--— 

n n — 1 


which is identical with the region in (1). 



LIKELIHOOD BATIO TESTS 


303 


The joint distribution of xi and s is [3] 


where 


\pi{xi, s) dx i ds = fo(xi) dxr<t>i(.s ) ds, 


dxi = - e~ n(xi ~ B)h dx i 


and 


<pi(s) ds = 


_ (VP: 




(n - 2)! 

When B < 0, the power function P(B) of this test is 

/>* pkaln 

P(B) = 1 - ds ^i(xis) dxi = 1 — e nB/ff [l ~ a]. 
Jo Jo 

When B > 0, the power function is 

/.<*> pktt/n 

P(B ) = 1—1 ds s) dii 

■J an/ki Jb 


( 2 ) 


- + /[« -1; VV “ 


„ , , f aVe" 1 da; 

Jo 

which is the form in which the Incomplete Gamma Function has been tabulated 

[5]. _ 

Since a must be positive, e nB " < 1 if B <0 and therefore P{B) > a in the 
interval — oo < B < 0. To show that P(B) is > a in the interval 0 < B < °o , 
it is simpler to work with the expression for P(B ) as a double integral in (2), 
than to differentiate the power function directly. Performing the integration 
with respect to *i, 

P(B) =1+ T [V 2 ‘“ S7,)/ ' - 1 ].*(«)*. 

Differentiating with respect to B? 

P'(B) = f -e-< k ‘- SnV, '<j, i (s)ds. 

Jsn/ki <r 

The integral expression for P'(B) is obviously positive. Therefore since for 
B > 0 the derivative is always positive the function must be monotomically 



304 


EDWARD PAULSON 


increasing in this interval (0 < B < + co), so P(B) is > a when B > 0 There¬ 
fore this test is also completely unbiased. 

We now consider the hypothesis H"‘ that two samples are drawn from ex¬ 
ponential distributions with the same location parameter, assuming it is known 
the samples must have come from two exponential distributions with the same 

scale parameter. Given a sample of ni values of x drawn from i ^ 

cr 

and another independent sample of m values of y drawn from - dy, the 

cr 

hypothesis we wish to test is that f? 2 = B\. Let x x be the smallest of the n x 
values of x and yi be the smallest of the n% values of y, let L be the smallest of 
the 7ii + n 2 = iV values of both x and y. Then the likelihood ratio for this 
hypothesis is 


Xs — 


X (x, - x,) + X) (yi - Vi) 


r 1 t 

X (X, - L) + £ (y, - L) 

_ t**l jp<l _ 


i+- 

U_ 


where 


and 


if ?/i > Xi 
if > Hi, 

"i nj 

u = X (*. - *i) + X (v< - yi )• 


2 = n 2 (yi - xi), 

= 7l X (Xi - Pi), 


1“»1 wl 

The region of acceptance, Xa. < X a < 1, is equivalent to the region 0 < Z < K)U, 
where iC a is again a function of a, the level of significance, the exact relation 
being 

f ka (N - 2) dt 
Jo 


= 1 - 


so 


a. 


(1 + f )*- 1 ’ (1 + 

It is known [3] that u is independent of Z, and that its distribution is 

du 


faiu) du = 


<r»^{N — 3)1' 


The distribution of z is somewhat complicated; but it can be derived by observ¬ 
ing that the probability that z lies in any infinitesimal interval Zj ± | dzi is 
the sum of the probabilities that mivt - x x ) and Mi(xi - y x ) lie in that interval 
and by then using standard methods for finding the distribution of the difference 
of two variates. For the case G = B x - B x > 0, the distribution f(z) of z is 

- n\Ql<r 


/l(z) dz = 


(3) 


(ni + n 2 )(7 


+ me ‘ lr ]dz, 0 < z <; mG, 


f ^ + m<r nx ° h Y ,l ° dz 

mil - w+w, -' 


mG < z < oo. 



LIKELIHOOD RATIO TESTS 


305 


For the case G < 0, the distribution of z can be derived from (3) by interchang¬ 
ing ?h and n 2 and putting — G in place of G, 

The power function of this test can now be derived. For the case G > 0, 
the power function P(G) is 


(4) 


f nniQlki 

P(G) — 1 — < du fi(z)t/> s (u) dz — I du fi(z)tj> 6 (u) dz 

1/0 JO Jo JAju 

pkiu ^ 

+ / du Mz)fa(u) dz >. 

JnjG/fcn I 


Upon integrating out and simplifying, the power function becomes 


P(G) 


= Jn^\ 

\ni + nsj 


+ 1 


N - 2; 


n 2 G 

kscr 


] 


+ 


_ i\n - 

\ni + 7h/ { |_ * 3< r 


^2 

fh + « 2 


(s—T 

\7ii — fix hi/ 


N - 2; 


6r(n a — ttifca) 

kiir 


The power function when G < 0 is easily derived from that for G > 0 by every¬ 
where interchanging ni and n 2 and substituting —G for G. 

To show that P(G) > a when G ^ 0, it is only necessary to show that the 
derivative P'{G) of the power function is always positive when G > 0, and al¬ 
ways negative when G < 0 It is again considerably simpler to use the expres¬ 
sion for P(G) as a double integral. For the case G > 0, integrating with respect 
to z in (4), 


P(G) = 1-[1 - <f 

ni + tij 


4- 


l 


n * 0,k> ihe~ niah 
ni + n 2 


le ni ‘ ,ni ' - e- z "}& 0 Mu) du 




+ me- ni0/c ) 

n-sO/k, Tli + n 2 


[-e-'-t\ u oMn)du, 


where [/(e) ]„ = /(f>) — f(a). Upon differentiating and simplifying, 

P'(G) = - n MH - e -njO/<rj e «ifcju/njff __ e ~^]Mu) d U 

(ni + n 2 )cr Jo 


+ 


nin 2 


(ni + n 2 )<r Jn^/kt 


r 

^ n« 


e 


—Aju/o- ^n 2 n/a 


e- n ' a, °]Mu)du. 


Both integrals are easily seen to always be positive, so P‘(G) is positive when 
G > 0. In the same manner it can be shown that P'(G ) is negative when G < 0. 
Therefore this test is also completely unbiased. 



306 


EDWARD PAULSON 


The question of investigating the bias of the likelihood-ratio tests for (a) 
testing the hypothesis that a = cr 0 when B is known and (b) testing the hy¬ 
pothesis that o' — (To j nothing being known about the value of B, are practically 
identical with the analogous problems for a normal distribution. The results 
are also the same, for the X test for (a) is completely unbiased, while that for 
(b) is biased. 

REFERENCES 

[1] J. Neyman and E S. Pearson, "On the use and interpretation of certain test criteria 

for purposes of statistical inference,” Biomelnka, Vol. 20a (1928), pp. 221-230. 

[2] P. V. Sukhatme, "On the analysis of k samples from exponential populations with 

special reference to the problem of random intervals,” Slat. Res Memoirs, Vol 
1, pp. 94-112 

[3] P. V Sukhatme, "Tests of significance for samples of the x 5 population with 2 degrees 

of freedom,” Annals of Eugenics, Vol. 8 (1937-38), pp. 54-55. 

[4] Joseph F. Daly, "On the unbiased character of likelihood-ratio tests for independence 

in normal systems,” Annals of Math. Slat , Vol. 11 (1940), p. 2 

[5] K Pearson (Editor), Tables of the Incomplete Oamrna Function, Biometric Laboratory, 

London, 1922. 



ON THE MATHEMATICALLY SIGNIFICANT FIGURES IN THE 
SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 

By L. B, Tuckerma^ 

The National Bureau of Standards 

1. Introduction. The number of mathematically significant figures in the 
solution of simultaneous linear equations has received attention from a number 
of writers [1-6]. It is an important subject, not only in least squares and 
correlations, but in many other problems of science where simultaneous equa¬ 
tions arise: it may not be amiss, therefore, to examine it from a fresh start, 
particularly since (as will be shown) some of the rules that have been published 
on it fail in certain frequently occurring circumstances 

2. Definitions. Before proceeding into the subject it will be necessary to dis¬ 
tinguish between the computer’s terms “significant figures” and “determinate 
significant figures ” The former are the figures that compose a number, without 
the consecutive ciphers that precede or follow them, merely to locate the decimal 
point. “Determinate significant figures,” on the other hand, are figures that 
are justifiable on computational grounds. From the computer’s point of view, 
the number of significant figures remains independent of what is statistically 
significant. To avoid confusion in what follows, the term "significant figures” 
will be used in the computer’s sense, and the adjective “determinate” will be 
supplied where mathematical determinacy is implied. 

To avoid prolixity the term "observational error” will include any uncertainty 
arising either from errors in the observations or from the statistical nature of 
the problem (e g. sampling errors, grouping errors, etc ). The observational error 
of the result is independent of the particular sequence of compulation followed and 
the accuracy with which it is earned out 

The term “computational error” will include all the additional uncertainties 
arising from the approximations occurring in the particular sequence of computa¬ 
tion used, including the “rounding off” of the final result. The computational 
errors, unlike the observational errors , depend m general upon the sequence of the 
intermediate steps used in the computation as well as on the number of significant 
figures to which they are earned. 

3. Criterion of an adequate computation. If the number written down at the 
end of a computation is to serve its purpose the maximum possible computational 
error must be suitably limited. 

A decimal representation of a number containing / significant figures is subject 

307 



308 


L. B. TUCKERMAN 


to an uncertainty (upper limit of absolute error) of 5 in the (f -f- l)th place. 
It has, therefore, a possible relative (not absolute) error of representation some¬ 
where between 5 X 10 -(/+1) and 5 X 10 -/ , in magnitude This relative compu¬ 
tational error sets the limit to any valid final rounding off. Regardless of the 
accuracy to which the intermediate steps of the computation have been carried, 
this relative computational error mtioducecl by the final rounding off alone 
must be suitably limited. 

In case all of the accuracy obtainable from the data is not needed in the result, 
the sum of the maximum possible computational error (including the error of 
the final rounding off) and the maximum possible observational error must be 
kept below the error which can be tolerated m the result. 

In case all of the accuracy obtainable from the data is needed in the result, 
the maximum possible computational error m the result (including the error of 
the final rounding off) must be negligible in comparison with the uncertainty 
(observational error) in the result arising from uncertainty in the data. Just 
how small a fraction of the observational error is “negligible” is necessarily a matter 
of judgment, and will depend upon the nature of the problem. A computational 
error that would be wholly negligible in some ordinary computations might be 
intolerably large in the adjustment of an accurate geodetic survey. In any case 
the only basis for a valid judgment of the adequacy of the computation lies in a 
comparison of (i) the maximum possible computational error that can arise in 
the sequence of computations including the final “rounding off," with (ii) the 
observational error of the result arising from the observational errors inherent 
in the data. 

4. Propagation of error in a system of linear equations. Assume that 

(1) 2 s = 1, 2, ■ ■ ■, n, 

i 

is a set of simultaneous linear equations derived in some way from observations 
and m which the coefficients a lt and the absolute terms b, may all be subject to 
observational error. If the relative (not absolute) observational error of a 
quantity q be represented by S 3 it may readily be seen that 

SXj = -X) (x k /x,)A hj a hk ha hk + £ (b,/x i )A, j Sb, 

h U 9 

6A = 2 X/ Ahk&hktohk 

h k 

t 

where A is the determinant of the coefficients a llk , and A** is the term corre¬ 
sponding to a** in the reciprocal (not the adjoint) determinant. 

5. Upper limits to observational errors. The sign and magnitude of the 
relative errors So**, and 66, are unknown, but we shall assume that it is possible 




SIGNIFICANT FIGURES 


309 


in any problem to assign to them upper limits 

| Saw | and | 5 b, | 

which in magnitude they cannot exceed. If the problem is such that the values 
of each of the 5and the 8b, are wholly independent of each other, it is then 
possible that their magnitudes may all reach their upper limits | fia**. | and | 8b, \ 
simultaneously, in which case upper bounds of 8z, and 5A may be placed at 

^ | (ziclz^) A.h] dhk | | 8djik I “H ^ I (b,/Zj)A,j j | 8b, | 

hi B 

= I Ahkdhk | I Sdhk I 

h k 

6. Indefiniteness of the problem in the general case. The values of the 
and 8b, may not be independent of each other, in which circumstance knowledge 
of the law of their dependence would make it possible to assign upper limits to 
the magnitudes of 8xj and 5A These upper limits can not be larger than the 
upper bounds shown in equation (3), and in special cases they will be much 
smaller. Since the dependence of Sclm and 8b, may m general have any form 
whatever, cases can and will occur in which the upper limits of the relative 
errors of 8x, and 5A may have any ratio whatever. 

7. Case of independent errors. Any general discussion of the errors that can 
occur m x , and A must be based either on some special assumption or on the 
limiting assumption that the errors are independent. It is this latter assump¬ 
tion that underlies the usual discussion, and will be the basis of what follows. 
Equation (3) gives the upper limit to the 8x, and 5 A under these assumptions. 

8. The ratios of | 8x, | and | 8 A | are still indefinite in spite of the assumption 
of independent errors in the coefficients. However, equation (3) does not deter¬ 
mine any definite ratio or inequality between the upper bounds | 5m,-1 and | 6A |. 
The nature of the observations may be such that some of the errors in the a** 
and b, are very small and some relatively large. Not infrequently it is safe to 
assume that some of them are free from appreciable error and to ascribe all the 
error of the x, to the error m one or two of the a** or b ,. If any statement of a 
definite relationship, either as an equality or an inequality between | 5A | and 
the | 8x, | is valid for all possible sets of linear equations, it must at least hold 
in the special case in which the errors of all the b, and the errors of all except one 
of the cihk are negligible. 

If such a statement of a definite general relationship between these upper 
limits of errors can be made, it must be possible to write down an equation or an 
inequality between any one of the expressions | Ahk | and some or all of the 
corresponding expressions | (xk/xj)Ah, j, j = 1, 2, ■ • * , n, that will remain true 
no matter what be the values of the ahk and the b, in the original set of simulta¬ 
neous equations It is obvious that the ratio of | Ahk | and | {xk/x,)Ahj |, 
(j ^ fc), depends upon the values of the a hi , and sets of equations can be found 


(3) 


5xj | 

I BA I 



310 


L, B. TUCKERMAN 


to give any assigned value to that ratio. It is therefore impossible to state any 
rule that will restrict the ratio of the relative error of A and the relative error 
of any one of the x, , valid for all possible sets of linear equations. 

9. Definite statement about the sum of the relative errors in the unknowns. 

However, in the summation E I Sx/ | there occurs the term corresponding to 

J 

j = k, for which | {x k /x,)Ah, | = I Am |, so that under the assumption that the 
owt and b , are independent sources of error, we may write the inequality 

(4) E | Sx, | < | SA | 

which states that the sum of the upper bounds to the relative errors of all the xj 
cannot be less than the upper bound to the relative error of the determinant A. 
A corresponding statement can easily be proved for the standard deviations 
A limiting case can be constructed in which the inequality (4) reduces to 

(5) E | &x, | = [ 5A | 

i 

and in which all of the | 5x, | are equal. For this case, 

(6) | 5A | = n | Sx, | for all values of j. 

If n < 10 it is obvious that there will be at least one more determinate signifi¬ 
cant figure in each of the than in the determinant A of the coefficients. 

It is frequently assumed that the number of determinate significant figures in 
the solution for any unknown cannot exceed the number of determinate signifi¬ 
cant figures in the determinant A of the coefficients. We see now that this state¬ 
ment can not be generally valid, even under the assumption that the a Afe and b, 
are independent sources of error. As a matter of fact, it is necessary in some 
cases to compute some or even all of the unknowns to more significant figures 
than are determinate in the determinant A of the coefficients, if one would retain 
in the result all the accuracy that is obtainable from the data. 

Cases in which the relative observational error of every one of the unknowns 
is less than the relative error of the determinant A probably occur rarely in 
practice; in fact the only ones that I have seen are those that I constructed 
purposely to show that such a thing is possible. However, cases in which the 
relative errors of one or several but not all of the unknowns are much smaller 
than the relative error of the determinant A, occur fairly frequently. 

10. Remarks on the case of "near indeterminacy." t The major interest in 
curve fitting centers around the condition of "near indeterminacy," i.e., of a 
small or near vanishing determinant A. Even in the circumstance where the 
relative error of the determinant is much greater than the relative error of some 
or all of the coefficients and absolute terms, the relative error of one or more of 
the unknowns may be much smaller than the relative error of the deter mina nt, 
as may be seen from what follows. 



SIGNIFICANT FIGURES 


311 


In accurate experimentation the endeavor is, wherever possible, to arrange the 
experiment so that the quantity sought comes directly from the measurement as 
represented by an equation such as 

(7) x = p. 

However, so ideal an experimental arrangement is rarely if ever possible, and it 
is a common experience to find that the measurements are represented by an 
equation such as 

(8) x + qy + rz + su + ■ ■ ■ = p, 

where qy, rz, su, etc., are small corrections that must somehow be evaluated. 
For simplicity, the discussion will be confined to the almost trivial case 

(9) x + qy = p. 

Not infrequently the only way the correction can be evaluated is to rearrange the 
conditions of the experiment so that another equation is obtained in the form 

(10) x + q'y = p'. 

Sometimes the nature of the experiment is such that it is not possible to change 
the coefficient of y by more than a small amount, under which conditions 

(11) q’ = q{l + p), 
and 


( 12 ) 


p’ = p(l + a), 


where /3 and a are small m comparison with 1. The solution of equations (9) 
and (10) now gives 


(13) 


V Q 


1 5 

1 q' 


pq' - p'q 
q - q 


= p( 1 - a/p). 


The quantity q' — q seen in the denominator of this equation is the determinant 
A of the coefficients, and by equation (11) its value is pq. Since pq is assumed 
to be small here, the solution for x encounters a near vanishing denominator. 
It would, however, be wrong to assume that the number of determinate signifi¬ 
cant figures in x that can be obtained by solving the equations is necessarily 
limited to the number of determinate significant figures in the denominator A. 

If the experimenter has been fortunate in finding suitable experimental condi¬ 
tions, the denominator A = pq, although small in comparison with either q' or q, 
will still not cause difficulty. It will be observed that the coefficients of q' and q 
in the denominator are equal (both bemg unity). Now if the coefficients p 
and p’ in the numerator are nearly enough equal, so that q' and q occur in both 



312 


L. B. TUCKBIIMAN 


numerator and denominator so nearly proportionally that the uncertainties in 
q and q' produce nearly compensating errors in both numerator and denominator, 
then x will bo given to more determinate significant figures than are found in 
the denominator A. It can then be said that the experiment is successful in 
evaluating the correction term qy in equation (9). 

On the other hand, in less fortunate circumstances, to the exasperation of the 
experimenter, the denominator A — q' — q = Pq is not only small, but p‘ and p, 
although still nearly equal, differ enough so that the errors in q 1 and q are not 
compensated by the nearly equal coefficients in the numerator. The experiment 
will then fail to improve the approximation p for x by failing to evaluate the 
small correction qy in equation (9), This would be an inherent defect in the 
experiment and could not be removed by any manner of computation. 

The same conclusion would of course be drawn from the coefficient of p (viz., 
1 — a/1 3) at the extreme right of equation (13). It is not the size of j3 that 
alone determines the number of determinate significant figures in r, it is rather 
the ratio between a and 0. In the fortunate experimental circumstances de¬ 
scribed above, the near equality of p' and p offsets the near equality of q' and q 
by reducing the term a/0 to a value small compared with unity; the term a/p, 
being small, acts to reduce the effect of the uncertainties in q and q' (i.e., in q 
and 0) in the evaluation of x. On the other hand, in less fortunate circum¬ 
stances, the correction term a/p can not now shield x from the uncertainties in q 
and q' since the relative difference a between p and p' is not small enough to 
reduce a/p to innocuity. 


11. Numerical illustration of compensating errors. As a "horrible example" 
especially constructed to emphasize the theoretical possibilities, take the fol¬ 
lowing special case— 


(14) 


J 1000.10000a: + lO.OQOOOy = 1010.10000 
\ 1000,000002; + lO.OQOOOy = 1010.00000 


wherein it is assumed that the coefficients and the absolute terms (assumed to 
be derived from the observational data) arc all correct to the fifth decimal place 
as given, and no closer estimate of their errors is possible. So far as known, the 
upper limit to the absolute observational error of each is then the same, i e. 
5 X 1CT 8 , but the coefficients of x (au and an), and the absolute terms (b\ and b%), 
all have nine determinate significant figures, while the coefficients of y (an 
and das), have only seven. Thus, 

| San | > 5 X NT 8 , | San ( > 5 X KT 8 , | fa | > 5 X X0~ e , 

| Sbt | > 5 X 1(T 8 , 


but 

(15) 


tou | > 5 X 1(T 7 , | San | > 5 X KT 7 , 



SIGNIFICANT FIGURES 


313 


and x = l,y = 1, A = 1, whereupon a substitution of values from (15) into (3) 
gives the inequalities 

(16) | Sx | > 3 X 10 -1 , | Sy | > 3 X 10~ 3 , | SA | > 1.01 X HT*. 

So far as known, the determinant A may thus be in error by as much as 1 per 
cent, and y by as much as 3 per cent, yet x is known closer than l/30th per cent. 
Here the value of the unknown x cannot be adequately represented by less than 
four significant figures, and might even require five, m spite of the fact that 
neither A nor y requires more than three significant figures to represent all that 
is certainly known about them. 

The reason for this disparity in relative errors can be more easily seen by 
substituting numencal values for all the coefficients in the expression for x 
except a w and an . The possible relative errors of a w and a® are, as noted 
above, about 100 times as great as the possible relative errors of an , a 2 1 , b L , 
and f? 2 , and are the controlling errors in A. In the solution 

■ _ 1010.10000a 22 - lOlO.OOOOOan 
1 } X 1000.10000a 22 - 1000.00000a 12 ’ 

however, both a 12 and a 22 occur in both numerator and denominator, and more¬ 
over the coefficient of each in the numerator is nearly equal to its coefficient in 
the denominator, so that a change in either a ]2 or an changes both numerator 
and denominator nearly proportionally, with the result that their ratio x is 
known much more accurately than either the numerator or the denominator A. 

This kind of compensation of errors in a computation is not confined to the 
solution of simultaneous equations (and it is not an infrequent occurrence in 
other computations). This is one of the many reasons why it is impossible to 
give general rules for the retention of significant figures that will be valid for 
all types of computations 

12. Geometrical analogy. Moulton [4] illustrated his reasoning by the fol¬ 
lowing geometrical analogy. The solution of three linear equations is equivalent 
to finding the point of intersection of three planes. When the determinant of 
the coefficients is small in comparison with the coefficients themselves, these 
planes are either nearly parallel, or the line of intersection of any two of them 
is nearly parallel to the third In these cases small uncertainties in the location 
of any one of the planes correspond to large uncertainties in the position of their 
point of intersection. 

In the first circumstance the planes might all be nearly parallel to one of the 
three coordinate planes, with the result that large uncertainty would afflict the 
value of the determinant and two of the unknowns, the third being much more 
accurately determined 

In the second circumstance, the line of intersection of two of the planes might 
be nearly parallel to one of the coordinate axes. When that happens, large un- 



314 


L. B. TCCKERMAN 


certainty will afflict the value of the determinant, but only one of the unknowns, 
the other two being much more accurately determined. 

This geometrical analogy can be extended to cover simultaneous equations 
with any number of unknowns. Near-vanishing of the determinant A of the 
coefficients necessarily implies relatively large uncertainties in the determinant 
and also in at least one of the unknowns, but not necessarily m all of them. 
These are, of course, very special cases, but, as noted above, they are of frequent 
occurrence in actual problems. 


13. Evaluation of computational error. The relative computational error in 
Xj must be kept within certain definite limits which depend upon the particular 
problem to be solved (section 3). To do this it is necessary to be able to calcu¬ 
late an upper bound to the relative computational error inherent in any particular 
sequence of computations 

In many computations it is easy to write down a simple formula that will set 
an upper bound to the relative computational error involved in that particular 
sequence This formula contains numbers .A , / 2 , f 3 , etc., each representing the 
number of significant figures accurately computed at some particular step 
Once a simple formula for relative computational error is written down, it is 
easy to choose values of f\, ft, ft , etc. that will give an upper bound to the 
relative computational error not larger than the permissible limit of maximum 
possible computational error outlined in section 3. This method of determining 
an upper bound of the relative computational error should be used whenever such 
a simple formula can be found. For example, to compute x from equation (13) 
we may use the following sequence - n — q' — q, r 2 = ri/q = P, ra = p' — p, r * — 
r 3 /p = a, n = n/r z = a/i 9, n = 1 — r 6 = 1 — a/p, r 7 = pr b = p(l - a/p) = x. 
x may then be written as a function of these partial results, viz.: 


(18) 


x = r 7 = pr a = p( 1 - r s ) = p(l - n/r 2 ) = p(l - r 3 /pr 2 ) 

= p(l - nq/n). 


Applying first order error theory we find 


(19) 


e(ar) | ^ 


a/P 


1 - a/P 


e(r i) | + | «(r 2 ) | -f | <(r 3 ) | + | e(u) \ + ] e(r B ) |} 

+ I «(r«) | + | «(r 7 ) 


where e(r.) represents the relative error in r, arising from the computation by 
which r, was determined from the preceding partial results, n , r 2 , • ■ ■ , r<_ :x , 
and t(x) is the total relative computational error in x when so computed. It 
is easy to keep t(x) within any desired limits by suitably limiting each error term 
of (19). Since a computation accurate to f significant figures involves a relative 
computational error not greater than 5 X 10 _/ , any desired limits can then be 
set to each error term of (19) by a proper choice of the number of significant 
figures that should be carried in that step. 



SIGNIFICANT FIGURES 


315 


Unfortunately there seem to be no reasonably simple formulae for determining 
upper bounds of the relative computational errors that arise in the solution of 
simultaneous linear equations in more than two variables This does not ab¬ 
solve the computer from the necessity of ensuring that his computational errors 
are suitably limited. 

The method I have found most economical is to carry the solution of simulta¬ 
neous linear equations to the capacity of the machine, and as each partial result 
r, is obtained, write it as 

r.(l ± O, 

where r, is the value actually found and e, is a positive number representing the 
accumulation of uncertainty introduced by all preceding steps in the computa¬ 
tion. At the end of the computation each of the unknowns is found in the form 

(20) x,(l ± g,), 

where x , represents the value found and «, is the upper bound of the relative 
computational error in x, 

A comparison of e, with the upper bound of the observational error | | of 

equation (3) will then indicate whether the computation is adequate. If the 
comparison shows that the computation was inadequate, it will show in which 
steps the number of significant figures /; was too small, and by how much. 
The computer can recompute, carrying these steps to the requisite number of 
figures with the assurance that his recomputation will then be adequate. The 
comparison will further indicate in which steps if any the number of significant 
figures ft was larger than necessary. 

When a computer has thus set suitable upper bounds to the relative computa¬ 
tional error in the solution of a set of linear equations, he is in a position to plan 
solutions of future similar sets so as to perform his computations more eco¬ 
nomically and yet safely This is especially true when the solution of simulta¬ 
neous linear equations arises week after week in routine testing. 

14 Conclusions. Summary rules have been published, purporting to be safe 
guides to computers in avoiding needless work, and ensuring that the computa¬ 
tions are carried to a sufficient degree of accuracy. Many of them are useful 
guides for certain types of computation and for limited ranges of the numerical 
values entering into the computation, blit none of those that I have seen can be 
used generally. The only safe rule, where the matter is of importance, is to 
calculate the maximum possible computational error that can enter in the par¬ 
ticular sequence of computation followed, and make sure that it is kept within 
the necessary limits. 

It is sometimes necessary to carry the intei mediate steps of a computation to 
many significant figures beyond the significant figures given in the data, or kept 
in the result The relative error of one of the unknowns may be very much 
smaller than the relative errors of the data from which it is computed, while the 



316 


lu B. TUCKERMA.N 


relative error of another of the unknowns may be larger. The methods of 
ensuring that the computations are adequate are outlined in section 13 

For the best sequence to follow in the elimination of the unknowns, I shall 
pass along a suggestion of Dr, W. Edwaids Deming which he gave in one of our 
discussions of this subject. I venture to pass it along, because it has worked in 
every special case that I have constructed in an attempt to prove that it does 
not hold generally If ever the suggestion fails, the computer may change the 
sequence; but in any case he is obliged, as stated above, to calculate the maximum 
possible computational error that can enter into his calculations. Dr. Deming’s 
suggestion is this: “To evaluate some but not all of the unknowns to the highest 
possible computational accuracy, retaining as few significant figures as possible 
in the intermediate steps, solve the equations by successive elimination, elimi¬ 
nating first and evaluating last the unknowns of greatest inherent relative 
accuracy. *’ 

15 Summary. Expressions are given for the maximum observational error 
in the unknowns of a system of simultaneous linear equations, in terms of the 
relative errors of the coefficients and absolute terms therein, In order to extract 
all the information possible from a system of linear equations representing ob¬ 
servational results, it is not sufficient in general to assume that the relative errors 
in the unknowns are as large as the relative error in the determinant of the 
system. In many problems the computation of some of the unknowns must 
therefore be carried to more significant figures than are determinate in the 
determinant of the system. Methods are outlined for evaluating computational 
error in the solution of linear equations to ensure that the computations are 
adequate. 

In conclusion I wish to express my thanks to Dr. W. Edwards Deming who 
has given much of his time to assist me in the preparation of this paper. He has 
made valuable suggestions on the material to be included and the general manner 
of presentation. In addition he has criticized the manuscript in detail and 
assisted in the final revision. 


REFERENCES 

[1] Edward B. Roessler, Science, Vol, 84, (1930), pp. 289-290. 

[2] Joseph Berkson, ibid,, p, 437. 

[3] P. J, Rolon, ibid., pp. 483-484, 

[4] F R. Moulton, ibid., pp. 574-576. 

[6] \V. Edwards Deming, Science, Vol. 86 (1937), pp. 451-454, Least Squares (Department, 
of Agriculture Graduate School, 1938), pp. 105, 111, 121, 135. 

F6] I. M. H, Etheringtqn, Edmh. Math Soc. Proc , Vol, 3 (1932), Part 2, pp. 107-117 



ON MECHANICAL TABULATION OF POLYNOMIALS 


By J. C. McPherson 
International Business Machines Corporation 

1. Introduction. The purpose of this paper is to show how automatic 
accounting machines, which have been used previously in evaluating such 
quantities as 'Zx n and 2a; 71-1 ?/, may be used in the preparation of mathematical 
tables of integral powers, of polynomials, and of functions which can be approxi¬ 
mated by polynomials. These tables may be prepared for any desired intervals 
of the argument such as 1, T V, rbo> i> i> etc. 

The method is an adaptation of the general theory of "cumulative” or "pro¬ 
gressive” totals which has proved useful m computing moments and product 
moments both with and without accounting machines. The reader unfamiliar, 
with the mathematical method and its machine applications might refer to such 
presentations as those of Hardy [1], Mendenhall and Warren [2, 3], Razram and 
Wagner [4], Brandt [5], and Dwyer [6, 7]. The main feature of the method is 
the computation of summed products or of summed powers by means of succes¬ 
sive cumulated additions. It is shown in this paper how it is possible to use 
this same process in constructing tables of powers and tables of polynomials 


2. The Cumulative Formulas. If the numbers F z are defined and finite 
for a: = 1, 2, 3, • ■ ■ , (a — 1), a, and if these values of F x are cumulated for x = 
a, x = a — 1, etc , then the value in the row headed by x = 1 can be written 
as l Ti . If these cumulations are cumulated successively with the superscript 
indicating the order of the cumulation and the subscript indicating the value of x 
which heads the row, then 


T x = 2 xF X! 


T ‘ ~ 2 —fi— F 


To = 2 


x(x — 1) 
2! 


F x , 


im _ „(* + 2 ) (x + 1)* p 
Tl _2-3j- F x 


and in general for i < j, 

(1) s + J ~ + 1)] ’ 


(j-i) 


(j- Dl 


■F, 


Formula (1) is basic to much of the previous work involving cumulative totals. 
Various authors have studied such important special cases as (A) where F x 
equals the frequency function f x , (B) where F x = xf x , and (C) where F x equals 
the sum of all the values of y having the same x value. These special cases have 
been found very useful in computing moments and product moments. 

317 



318 


J. C. MCPHERSON 


The moments may be expressed in terms of the cumulations in a variety of 
ways. The diagonal formulas have the differences of zero as coefficients and are 
expressed in terms of l l\ , ~T\ , a T 2 , 4 T 3 , E T 4 , etc The columnar formulas, 
whose coefficients have been recently studied [6, 7], are expressed in terms of 
cumulations of the same order, 3 T a , with j fixed. Razram and Wagner [4] 
have given formulas which utilize the entries of different rows and different 
columns but which demand fewer entries for the formulas. Itazram and Wag¬ 
ner worked out the formulas through 2 x% but the argument holds for 2x i F x . 
For purposes of comparison the values of , i = 0, 1, 2, 3, 4, as they appear 
in the diagonal, columnar, and Razram-Wagner systems are presented in 
Table I. 


TABLE I 

Values of Xx'F x for i = 0, 1, 2, 3, 4, 


Fx 

Diagonal 

Columnar 

Rnsram-Wngner 

mm 

l T 4 

'T a 

‘Ti 


*T 1 

2 T 1 

2 7\ 


“Ti + 2 3 2h 

3 2’, + ’Tj 

s 2\ + - 3 T 1+2 


2 7\ + 

4 2\ + 4 4 7h + 4 T, 

J 2\ + 6 4 2’ 2 


*T 4 + 14»!r, + 36 *T, + 24 ! Z\ 

“T, + lPTs + H 8 T, + “Ti 

■ZVn + 12 S T S+ , 


In developing the theory of the later sections of this paper I have developed 
further formulas of the type shown by Razram and Wagner since these formulas 
have fewer terms than do those of the other Bystems and the coefficients are 
factorable by O' — 1) !/2. These formulas for Sx'fh, with s even, feature such terms 
as "Ti + 3 T 2 = a Ti+ 2 , e Ti +3 , etc., so that there are two entries from the same 
column. For the purposes of this paper it is preferable to have a single entry 
from each column and this situation results from continued application of the 
formula 

(2) '7\ +( . + i, = + Th. i = ’~ l T { + 2 >T , +1 . 

The formulas for 2a :’F X with s ^ 12 are given. The alternative forms are given 
for the formulas involving even values of s. 

2F. = 'Tt, 2xF x = 2r I\, Ex *F X = a T x + 8 T 2 = S T 1+S = J Ti + 2 5 T 2 , 
2x *F X = 2 Ti + 6 4 T 2 , 2x *F X = , Ti +2 + 12 5 T 2+3 

= *ri + 2 8 T 2 + 12 4 T 2 + 24 1 Tb , 

2x B F X - ‘Ti + 30 % + 120 •r,, 

2x B F X = *T 1+i + 60 6 T 2 +3 + 360 7 l\+i 

= 2 Ti + 2 *T 2 + 60 l T t + 120 s T 3 + 360 + 720 7 T t , 

2x 1 F X = 2 Ti + 126 *Tt + 1680 9 T 8 + 5040 B T 4 , 

2x a F x = 3 T 1+2 + 252 5 T 2+3 + 5040 1 T i+i + 20160 9 T 4+ 5 

= 2 Ti + 2 8 T 2 + 252 4 T 2 + 504 B T a + 5040 B T a + 10080 7 T 4 
+ 20160 8 T 4 + 40320 *T a , 









TABULATION OP POLYNOMIALS 


319 


(3) Lx 9 F X = 2 Ti -I- 510 4 T 2 + 17640 B T 3 + 151200 8 T 4 + 36 2 8 80 10 3f B , 

Lx 10 F X = 3 7’i +2 + 1020 6 T i+3 + 52920 7 T a+4 + 604800 9 T 4 +i 5 + 1814400 n T i+i 
= 2 Ti + 2 *r a + 1020 4 T 2 + 2040 B T 3 + 52920 0 !F 3 4 105840 7 T 4 
+ 604800 s Ta 4 1209600 9 T 6 + 1814400 U T B + 3628800 n T e , 
Lx n F x = 2 Ti + 2046 4 T 2 + 168960 6 T 3 4 3160080 *T 4 + 19958400 m Ti 
4 39916800 12 T e , 

Lx 12 F x = 3 Ti +2 + 4092 b T 2 +8 + 506880 7 T s+i + 12640320 °T 1+6 
+ 99792000 u r 6+0 4 239500800 

= 2 Ti + 2 3 T 2 + 4092 4 T 2 + 8184 6 T 3 4 506880 e T 3 + 1013760 7 T 4 
4- 12640320 6 1\ + 25280640 °T 6 4- 99792000 10 T 6 
+ 199584000 u To + 239500800 W T B 4 479001600 n T 7 . 

The derivation of these formulas is obtained with the use of (1), with the use of 

(4) 3 Ti = 'TV + ] -'T t , 

and with the use of formulas of lower order For example we have from (1) 

(re 4- 4)(a; 4- 3)(* + 2)(i 4- l)a; ^ 

S- m - F x = T x 

so that 

Lx S F X = 120 fi 2\ - 10 *F a - 35 Lx *F X - 50 Lx *F X - 24 LxF x 

which after substitution of Lx i F x , Lx 3 F X , etc. and simplification results m the 
value 2 Tx + 30 *T t + 120 e T 3 . 

3. Tables of powers. If F x = 1 when x = a, but is zero otherwise 
then Lx ’F x is equal to a*. It follows that the value of a’ can be obtained from 
the successive cumulations of this F x with the use of (3) For example in 
Table II 


TABLE II 
















320 


J. C. McPHEHBON 


6 2 = 2 7’i + 2 s r s = G + 2(15) = 36, 

6 3 = a ?’i + 6 4 r 2 = 6 + 0(35) = 210, 

6“ = 2 71 + 2 a 7’ 2 + 12 4 7' 2 + 24 6 7' 3 = 6 + 2(15) + 12(35) +24(35) = 1206. 

The values of lr l\ , 3 71, 4 71 and 6 71 for a = 6 are italieized in Table II 

To get. the values of 5 2 , 5 3 , 5 4 , etc it would lie necessary to start to cumulate 
from x = 5. Now since the values of l 7\ are unity, it follows that the values 
for a — 5 can be found by taking the entries above those for a = 0 Thus 
5 71 = 5, a 71 = 10, 4 71 = 20, 5 71 = 15 with 5 2 = 5 + 2(10), 5 a = 5 + 6(20), 
5 4 = 5 + 2(10) + 12(20) + 24(15). It is evident in general that the values for 
any a", a, a 4 can be obtained by taking the row headed by a as the bottom row. 
Thus using a = 8, we have 8 2 = 8 + 2(28), 8 3 = 8 + 6(84), etc. It then appears 
that we may omit the rc column of Table II and consider the cumulations to be 
ascending cumulations for a rather than descending cumulations for x. 

A more satisfactory course is to cumulate the coefficients so as to eliminate 
the multiplications. Thus the value of 6 J 7\ could hi 1 obtained without multi¬ 
plication by cumulating 6, 0, 0, 0, 0 • • • rather than 1, 0, 0, 0, • • • . Several 
cumulations may ho carried on at the same time so that the, additions are not 
necessary and the tabulation results in a table of the desired powers. 

In preparation of a power table, the formulas (3) become a series of instruc¬ 
tions on the way in which we are to do the cumulating. For instance the 
formula: 

z 7 = 5040 'T t + 1680 6 7’ 3 + 120 4 7’ 2 + 2 7) , 

tods us that to form a table of the seventh power we must cumulate* the coeffi¬ 
cient 5040 eight times; add m the coefficient 1680 when there are six operations; 
the coefficient 126 when there are four; and the coefficient 1 when there are two 
remaining. A change m subscript tolls us that the coefficient when first included 
forms a separate total ahead of the ones already partly figured. When the sub¬ 
script does not change, the coefficient is to be included in the first summary card 
total. The final cumulating operation prints the actual table. 

To prepare a power table by machine we secure a set of cards punched all alike 
with the numbers from 1 to 9 punched diagonally in successive columns across 
the card. The machine is wired to add the coefficient of the highest term by 
selecting the proper digits from the diagonals, cumulate after each card and sum¬ 
mary punch each total. This way of starting saves one cumulation. The? 
summary cards are cumulated repeatedly in the same manner until the number 
of operations indicated by the highest term is completed. When the number of 
operations remaining equals j of another term , a card for the coefficient of 
that term is included in the tabulation ahead of the summary cards. This 
automatically adds the new coefficient to each term of the series. Wlnyn the 
subscript ^ in } T, changes, the new coefficient card must form a separate total; 

1 This operation is generally known as progressive totalling in machine operation. 



TABULATION OF POLYNOMIALS 


321 


when it does not change, the coefficient card must tabulate m the first summary 
card total. 

To illustrate the tabulation of power tables, the formula for the cube table is— 
a; 3 = 6 4 r 2 + 2 T i 

The successive operations yield the following table: 


TABLE III 


X 

1 

2 

Operation number 

3 

4.n 3 

1 

0 

0 

i 

1 

2 

6 

6 

7 

8 

3 

6 

12 

19 

27 

4 

6 

18 

37 

64 

5 

6 

24 

61 

125 

6 

6 

30 

91 

216 

7 

6 

36 

127 

343 

8 

6 

42 

169 

512 

9 

6 

48 

217 

729 

10 

6 

54 

271 

1000 


In actual machine work, operation 1 can be omitted and work begun with opera¬ 
tion 2. The machine is set to add the coefficient 6 of the highes* term from 
each card and an accumulated total is printed and punched for eac . card tabu¬ 
lated, giving the results shown under operation 2 An additional car , is punched 
for the coefficient of the second term, 1, and placed ahead of the cards produced 
in operation 2. The cumulation and punching is repeated, giving the results 
shown under operation 3. The summaiy cards from this operation are cumu¬ 
latively tabulated, giving the results shown under operation 4, which is the 
table of cubes desired 

Similarly, for a table of the fourth power, the formula a; 4 = 24 6 T 3 + 12 i T z + 
2 3 T 2 + ~'-l\ indicates the following operations— 


TABLE IV 


X 

1 

2 

Operation number 

3 

4 

5 a 4 

1 

0 

0 

0 

1 

i 

2 

0 

12 

14 

15 

16 

3 

24 

36 

60 

65 

81 

4 

24 

60 

110 

175 

256 

5 

24 

84 

194 

369 

625 

6 

24 

108 

302 

671 

1296 

7 

24 

132 

434 

1105 

2401 

8 

24 

156 

590 

1695 

4096 

9 

24 

180 

770 

2465 

6561 

10 

24 

204 

974 

3439 

10000 

11 

24 

228 

1202 

4641 

14641 

12 

24 

252 

1454 

6095 

20736 


322 


J. C, MCPHERSON 


Note in operation 3 where the subscript does not change, the coefficient 2 is 
added to the first card punched by the machine, while in operation 4 where it 
changes, the coefficient X appears as a separate total. 


4. Tables of polynomials. To tabulate values of f(x) = a + bx 4- c.r 2 • . 
(where a, b, c, ■ ■ ■ , are positive or negative coefficients) the method is similar 
to that of preparing power tables except that the coefficients to bo. added are 
determined by multiplying the coefficients of the formulas for the different powers 
by the values a, b, c etc., adding the coefficients of like terms in the various 
formulas, and using these resultant coefficients in place of the simple coefficients 
used in the power tables. Thus if we wish to tabulate values of f(x) ~ 4 + 3.r + 
2s 2 x the coefficients are found as follows: 

ix° = 4 Yo 
+ 3x = + 3 2 fTi 

+ 2z 2 = + 2 a 7\ + 2.2 3 T 2 

4- ** = + S T X + 30 *Tt + 120 8 T„ 

f{x) = 4 l T 0 + 6 2 Ti + 4 + 30 + 120 °2' a 

This equation gives instructions to perform six operations with 120 as coeffi¬ 
cient; adding the coefficient 30 as a separate total when there are 4 operations 
remaining; adding 4 to the first summary card total when there are 3 operations; 
adding 6 as a separate total when there are 2 operations lenmining; and adding 4 
on the last operation. 

The first few totals appear thus— 


TABLE V 


X 

1 

2 

Operation number 

a 4 

5 


0 






4 

1 

0 

0 

0 

0 

6 

10 

2 

0 

0 

30 

34 

40 

50 

3 

120 

120 

160 

184 

224 

274 

4 

120 

240 

390 

574 

798 

1072 

5 

120 

360 

750 

1324 

2122 

3191 

8 

120 

480 

1230 

2554 

1070 

7870 

7 

120 

GOO 

1830 

4384 

9000 

16930 

8 

120 

720 

2550 

0934 


32924 

9 


S3. 

8300 

10384 


59242 

10 

no 

K9 

4350 

14674 

40992 

100234 


It is not necessary to confine these tables to values for whole numbers, as we 
can tabulate equally well values of f{x) for intervals of x of .1, 01 or .001 or J, 
1, i etc. In this case, before combining formulas for different powers we multi- 





TABULATION OF POLYNOMIALS 323 

ply both sides by the desired interval raised to the power to which x is raised in 
that particular formula, then ad‘d like terms as before. 

To tabulate the previous example in .la; intervals we proceed as follows: 

4x° = 4.000 


3x/10 = + .3 2 Tj 

2(x/10) 2 = + .02 2 Ti + .04 3 T 2 


(*/10) 6 

= 

+ .00001 

2 7\ 

+ .00030 4 T 2 

+ .00120 Yj 


/(») 

= 4 1 To 

■+* -32001 

2 Ti + .04 3 T 2 + -00030 4 T 2 

TABLE VI 

+ .00120 6 r 3 






Operahon number 






1 

2 

3 

4 

6 


1 



0 

0 

0 

0 

32001 

4.32001 

2 



0 

0 

0003 

0403 

.36031 

4 68032 

3 



0012 

0012 

0016 

0418 

.40211 

5 08243 

4 



0012 

0024 

.0039 

0457 

44781 

5.53024 

6 



.0012 

0036 

.0075 

0532 

50101 

6 03125 

6 



0012 

0048 

0123 

.0655 

.56651 

6.59776 

7 



0012 

0060 

0183 

.0738 

.64031 

7 23807 

8 



.0012 

0072 

0255 

.0903 

73961 

8,07768 

9 



0012 

0084 

0339 

1332 

87281 

8.95049 

10 



0012 

.0096 

0435 

1767 

1.04961 

10.00000 


Where any coefficients are negative in the equations expressed in ir I\ terms, 
they are simply added in as minus figures. 

To round off the preceding function to 3 decimal places, we add 5 to the con¬ 
stant term 1 T 0 in the position to the right of the last decimal retained, i.e. in 
this case the 4th decimal place. The constant term is then 4.0005. 


Exact 

Counter reads 

Prints 

4.32001 

4.32051 

4.320 

4.68032 

4,68082 

4.680. 

5.08243 

5.08293 

5 082 

5.53024 

5.53074 

5.530 

6.03125 

6 03175 

6.031 

6.59776 

6 59826 

6.598 

7.23807 

7.23857 

7.238 

8.07768 

8.07818 

8.078 

8 95049 

8.95099 

8.950 

10.00000 

10.00050 

10.000 


6. Automatic calculation of polynomial coefficients. Frequently when 
polynomials are being evaluated, the process of forming the coefficients can be 



324 


J, C. MCPHERSON 


performed automatically from a punched-card table. Such a table consists of a- 
set of cards for each power x' containing the multiples of all the coefficients of 
each of the terms *T { in the formula (3) for that power. These multiples are 
1, 2, 3, 4 , • • • , 9; 10, 20, 30, 40 , • • ■ , 90; 100, 200 , ■ ■ • , 900; 1000, 2000 etc., 
and may be produced automatically by making a linear table of each coefficient- 
in the manner described in this paper. Each card is punched with the informa¬ 
tion called for by the heading of the following card form: 


s 

3 

i 

multiple 

coeff. X 
multiple 

07 

06 

03 

00005 

008400 


The particular figures indicated are those which would be punched for the 
term 5(1680) s 7 1 j in the representation of 5x 7 according to formula (3). 

The table is used by withdrawing the cards for the coefficients a, h, c, d, etc 
of the desired polynomial. For instance, if one of the polynomial coefficients is 
14485 a?’, we select from the x 7 section of the table all cards containing the multi¬ 
ples 10000, 4000, 400, 80, and 5. In the x table there are 4 cards for each multi¬ 
ple, one each for terms , l T 2 , and s 7\ . These cards are combined with 

the cards selected for the other coefficients of the polynomial and sorted to bring 
all cards for each 1 T , together. The cards for each term J 7’< are then automati¬ 
cally added on the electric accounting machine. 

6. Subdividing tables. In preparing tables it may be desired to prepare 
the table in more detail at certain points, giving values of the function at 1/10, 
1/20, 1/50, or 1/100, etc., of the interval of the rest of the table. This may 
readily be done by recalculating the coefficients of the cumulative terms, and 
using these values in the same manner as the original ones. 

There are many formulas for the determination of the subdivided differences 
given in various texts on interpolation, such as those given by Comrie [8] and 
Bower [9], One effective method is to use formulas (3) to calculate the sub¬ 
divided differences. The values called for in the formula for the highest power 
are taken from the table of the function at the regular interval, giving effect to 
the rule involving subscripts. These coefficients are reduced by an amount 
sufficient to cancel the coefficient of the highest cumulative term, and the coeffi¬ 
cients of the remaining cumulative terms are reduced in proportion according 
to formula (3) for the highest power. Usually the coefficient of the highest term 
of the formula will divide evenly into the coefficient taken from the table, and 
the other reductions are calculated by multiplying this result by the other 
coefficients of the formula. The highest remaining coefficient is then reduced 
by an amount sufficient to cancel itself, and, by use of the formula (3) for the 
power .whose highest cumulative term matches the highest remaining coefficient, 
the reduction to the remaining cumulative terms is calculated and subtracted. 


TABULATION OF POLYNOMIALS 


325 


The highest remaining coefficient is reduced in a like manner, and this process is 
continued until all the cumulative coefficients have been analyzed 

The partial cumulative coefficients thus computed are multiplied by the de¬ 
sired subdivision 1/m raised to the power of the corresponding formula (3), 
and recombined to form the new coefficients, as shown in the example below. 
In taking values from the table, when the subscript does not change, the tabular 
value must be reduced by the amount of the higher coefficient with the same 
subscript, to give effect to the rule that the coefficients in such cases are incre¬ 
ments (see last example in section 3), 

To subdivide the polynomial of section 4 at x = 7.0, we take the italicized 


values from Table Y starting at /(7) as 

'To , and proceed as follows: 



*Tz 

6 T 3 

4 T 2 

a T 2 

*Ti 

'To 



960 


10324 



From Table V , 

. 120 

-120 

3390 

-3390 

15994 

16930 

F(x) .. . 

ax B . 

120 

840 

3390 

6934 

15994 

16930 

120 


30 


1 




840 

3360 

6934 

15993 


bx* ... . 


840 

420 

70 

35 





2940 

6864 

15958 


CSC 3 . 

. . 


2940 


490 






6864 

15468 


da; 2 . 

. . . 



6864 

3432 



12036 

12036 


16930 


the interval is 1/10 we 

have : 



fl T 3 

6 t 3 4 t 2 

3 T 2 

2 T 3 'To 

x'/lO' = .00120 

.00030 


.00001 

35x710* = 

.0840 .04200 

0070 

.00350 

490x710 s = 

2.94000 


.49000 

3432x s /10 5 = 


68.6400 

34.32000 

12036x/10 = 



1203.60000 +16930 


f(x) = .00120 6 T 3 + .0840 6 T 3 + 2.9823 4 T S + 68.6470 3 T 2 + 1238.41351 + 

16930'T 0 provides the coefficients for subtabulating the function at the desired 
interval, beginning at the argument x — 7.0 

7. Accuracy of Tables. When tb" values of the coefficients are not 
exact, owing to the original values for a, 5, c etc. or the dropping of decimals in 
the computation of the coefficients, the errors accumulate fairly rapidly. Each 
coefficient will introduce its own error into the summation. 




326 


J. C. MCPHERSON 


To maintain accuracy throughout a long table it is advisable to transform /(*)■ 
by Horner’s method of decreasing the roots [10, pp 100-101], compute new 
coefficients for the transformed equation at intervals, and prepare the table in 
sections. Decreasing the roots by r gives us a new starting point at x = r. 

Since two or more functions may be computed at one time, a function for 
which the coefficients are not exact may be computed by adding in the usual 
way from the starting values and subtracting from the ending values simul¬ 
taneously. As many digits as agree in both tabulations of the function may be 
considered correct. 

The tabulations can be made to practically any degree of accuracy on the 
equipment available, as the newer machines can be formed into counters of any 
capacity up to 80 digits. In practice, counters of 16, 20 or 24 digits will ordi¬ 
narily suffice for the accuracy desired and two or more functions can be evaluated 
simultaneously. Cards are read and added at the rate of 150 per minute, or read, 
added and listed on the tape at the rate of 80 per minute and new summary 
cards produced at the rate of 40 per minute (on alphabetic equipment with gang 
summary punches). Computation may be carried out with additional decimal 
places and the final tabulation of the function rounded off to the nearest number 
retained. 

8. Summary. The cumulative or progressive-total method is shown to be 
applicable to the preparation of tables of functions expressed in the form of 
a power scries. 

The cumulative formulas for the powers through the twelfth power have been 
presented, and simple methods are given for transforming a power series into its 
corresponding cumulative formula, for changing the interval of the table, 
rounding off the values of the function, and subdividing the table at desired 
points, 

It is hoped that this discussion will make tables in printed or punched-card 
form more generally available as a tool for the computer. Since tables may be so 
readily prepared by this process, the usefulness of the tabular method of solving 
problems is greatly increased. 

The author wishes to acknowledge his thanks to Professor P. S. Dwyer for 
various suggestions, particularly in connection with section 2. 

REFERENCES 

(11 G. F Hardy, Theory of Construction of Tahlea of Mortality, pp. 59-62, 124-128. 

[2] R. M. Mendenhall and R. Warren, The Mcndenhall-Warren-Ii ollcrith Correlation 

Method, Columbia University Statistical Bureau Document, No. 1,1929, Colum¬ 
bia University, New Ywk. 

[3] R M Mendenhall and R. Warren, "Computing statistical coefficients from punched 

cards," Jour. Educ, Psych., Vol 21 (1930), pp. 53-62. 

[4] H, S. Razram and M. E Wagner, “The summation method in statistics," Jour. 

Exper. Psych., Vol. 14 (1931), pp. 270-283. 

[5] A E. Brandt, Uses of the Progressive Digit Method, Punched Card Method in Colleges 

and Universities, pp 423-436, 



TABULATION OP POLYNOMIALS 


327 


[6] P S, Dwyer, “The computation of moments with the use of cumulative totals," 

Annals of Math. Stat , Vol. 9 (1938), pp. 288-304, 

[7] P. S Dwyer, “The cumulative numbers and their polynomials," Annals of Math . 

Stat., Vol. 9 (1940), pp 66-71 

[8] L. J. Comrie, Interpolation and Allied Tables, Reprinted from the Nautical Almanac 

for 1937, His Majesty's Stationery Office. 

[9] E C. Bower, “Systematic subdivision of tables,” Lick Observatory Bulletin 467” 

(1936), University of California Press. 

[10] Whittaker and Robinson, The Calculus of Observations, Blackie and Son, Ltd., 
London, 1932, 



OH THE PROBABILITY OF THE OCCURRENCE OF AT LEAST m 
EVENTS AMONG n ARBITRARY EVENTS 

By Kat Lai Chuno 
Tsing Hua University, Kunming, China 

Introduction. Let E x , ■ • • , E n , denote n arbitrary events. Let 
p,j,., where 0 ^ i 5 j g n and (vi , • • ■, v,) is a combination, of the 
integers (1, ■ ■ ■ , ft), denote the probability of the non-occurrence of E , t , ■ ■ • , E„ t 
and the occurrence of E , l+l , ■ ■ • , E tj . Let p[, v ..,,\ denote the probability of 
the occurrence of E n , •.. , E f , and no others among the n events. Lot S, = 
2p, p . fj where the summation extends to all combinations of j of the n integers 
(1, ■ • ■ , ft). Let p m (vi , • • • , vt), (1 ^ w ^ i 5 n), denote the probability of 
the occurrence of at least m events among the k events E n , • ■ - , E n . 

By the set (®i, ■ • ■, a*, ■ • • , x a ) - (xi, ■*• , z&) (where b ^ a) we mean the 

set (shi , ■ ■ ■ i x a ). And by a ^^-combination out of (xi, - , x a ) we mean 

a combination of b integers out of the a integers (xi, • ■ < , x a ). 

We often use summation signs with their meaning understood, thus for a fixed 
k, 1 % k g ft, the summations in 2 p n „. Pk , or 2p m (vi, •. - , v*), extend to all 

the ^^-combinations out of (1, ... , ft). 

The following conventions concerning the binomial coefficients are made: 

(o) ~ (h) “ 0 ^ a < b or if b < 0. 

It is a fundamental theorem in the theory of probability that, if Ei, • • • , E„ 
are incompatible (or “mutually exclusive"), then 

2>i(l» ■••,») ■ Pi +-h Pn . 

When the events are arbitrary, we have Boole's inequality 

Pi(l, • • • , ft) ^ pi 4- • • • -j- p n , 

Gumbcl 1 has generalized this inequality to the following: 



1 C. R. Acad. Sc. Vol, 205(1937), p. 774, 


328 



PROBABILITY OP ARBITRARY EVENTS 


329 


for h — 1, ■ • • , n. The ease k = I gives Boole’s inequality. Fr^chet 2 has 
announced that Gumbel’s result can be sharpened to the following 


( 1 ) 




— A*, 


for k — 1 1. Thus, Ai is non-increasing for k increasing. On the 

other hand, Poincar4 has obtained the following formula which expresses 
Pi(l, ■*•,») in terms of the S,’s, 

Pl(lj • • • j w) = V*\ ^2 Pr ‘ * * 

( 2 ) 

+ = £ (-D M $. 

1-1 


In the present paper we shall study the more general function p m (vi, • • ■ , n) 
as defined above. First we generalize Poincare’s formula and Fr<5chet's inequali¬ 
ties In Theorem 1 we establish (for 1 ^ m g n) 


(3) 


Pm(l, ■ ■ 1 , n) — £ Pv !•••», 





+ (”* a" X ) .■« + - h <-!)”(” _ 

■f <-«•(- 


Although this result is well known, we prove it in preparation for Theorem 2, 
Theorem 3 establishes 


(4) 


A (m) _ 2 p m (v 1 , • ‘ , Vk+l) 

Ak+i — -;;-r- 


/ n — m \ 
\fc + 1 — m/ 



= 


J 


for k = 1, • • • , n — 1 and 1 ^ m g k. 

Next, we extend the inequalities (4), and in Theorem 4 we show that 

(5) Ai m) g + A&i); 


which states that the differences A* — Ak+i (k = 1, • • ■ , n — 1) are non-decreas¬ 
ing for increasing k. From this and a simple result we can deduce (4). Also 
Theorem 2 establishes that 


( 6 ) 




a 


m+i ) 


« Loc. cit., Vol. 208(1939), p 1703. 



330 


KAI LAI CHUNG 


for 21 + 1 ^ n — tn and 21 ■& n — m respectively. These inequalities throw 
light on formula (3) and are sharper than the following analogue of Boole’s 
inequality for p,„( 1, • • • , n), which is a special case of (■!) ■ 

(7) Pm(lj • • • i • 

The last statement will be evident in the proof. 

In Theorem 5 we give an "inversion" of the formula (3), i.e we express pi... n 
in terms of the p m (fi , ■ ■ ■ , ^)’s, as follows: 

= £p m (vij • • • , Vm) — - * • , V m+ i) + ... 

(8) + (~l) n ~ m p m (l, - 

n—m i 

= ( l) 53 ; * ' * > i) * 

t-0 


This of course implies the following more general formula for p ai .. ia> , 




Pa\ ■ *r ( 1) 53 Pm(^l j ' ' ' j i) 

i- 0 


where (at, .. - , a r ) is a combination of the integers (1, • ■ ■ , n) and where the 

second summation extends to all the ^-combinations of (m , • • • , a r ). 

Since it is known 3 that we can express other functions such as S r , in 

terms of the p H .. „ r ’s, we can also express them in terms of the p m (vi , • • • , n)' s, 
provided r ^ m. 

Finally, for the case m = 1, we give in Theorem 6 an explicit formula for 
Pn ,.ri in terms of the pi(m , , n)’s, as shown in (9), 

Pit- -r] = ~ Pt(r + 1, • • •, n) -f 2 Pita* r + 1, • • • , n) 


(9) 


- 'Hpi(vi,t’ 2 ,r + 1,... ,n) + ... 

*ir*2 

+ (-l) r_1 2 Pi(lj • • ■ , r , r 1, • • •, n), 

= 22 (-l) <_l £ Pi(vi, • • •, v x , r -f 1 , • • • , n), 

i-l (n.-.-.n) 

where (Vi, • ■ • , v.) runs through all the ^^-combinations from (1, 
This of course implies the following more general formula; 

r 

p[»i ■■•«,! = X) (-1) 1-1 £ Pi{vi, ■ ■ ■ ,Vi, a r+1 , • ■ • , a n ), 

'-i 


, ?)• 


3,, Fr6chet, "Condition d’exiatence de systemes d'6v6nomenta associ6s S. certaines 
probability,” Jour, de Math., (1940), p. 51-62. 



PROBABILITY OF ARBITRARY EVENTS 


331 


where («i, ■ • ■ , a r , ■ ■ a n ) is a permutation of (1, - * • , n) and. where 

(pi , • • • , v,) runs through all the ^^-combinations out of (ai, • ■• , a T ). From 

Theorem 6 and two lemmas we deduce a condition of existence of systems of 
events associated with the probabilities Pi(vi, • ■ ■ , v m ). The author has not 
been able to obtain similar elegant results for the general m Probably they 
do not exist. 


2. Generalization of Poincare’s formula; Generalization and sharpening of 
Boole’s inequality. 

Theorem 1: 



Proof: We have 


( 10 ) 


„(1, ... ,n)= Y Y Pbu- ■ 'Mm+ii > 




where the second summation extends, for a fixed b, to all the ^ ^ ^-combina¬ 
tions of (1, ■ • • , n). Further we have 


(H) 


i >■».+« 


Yj Yj Pin' • •y 

d-0 


**m+c+dj 


where the second summation extends, for a fixed d, to all the 

combinations of (1, • • • , n) — (vi, • ■ , v m +c). The formulas (10) and (11) are 
evident by observing that the probabilities in the summations are all additive. 
Now we count the number of times a fixed . Pm+6 ] appears m (3). By (11) 
this is equal to the sum 




+ (- 1 )"-"* 


/n-l\/ffl + ii\ 
\n — m)\m + b) 


1 , 


since this number is the coefficient of (—l)"V n in the expansion of 


(i - 


(-l)-V(l - x)\ 


Thus by (10) we have (3). 



332 


KAI LAI CHUNG 


f m -\~i~ 
i 


0 


s, 


m-fi ■ 


Theorem 2: For 2 1 ^ n ~ m and 21 £ n — m respectively, wa have 
(6) *£ (-i)'( m + t ~ 

i-O \ 1 / l-0 \ 

Proof: By the reasoning in the previous proof, it is sufficient (in fact also 
necessary) to show that 

l!+l 

/ »// —* i —v. \ § rrt n \ 

< 1 . 


Since 


fm — 1 + i\fm + b\ _ (m + b)l /b\ 1 

\ i )\m+i) (m — 1)! b! \i J m + i 

is an integer, it is sufficient to show that 

(12) E (-iy( h ) -V > 0, E 1 (-I) 1 (?) -~r~ g o. 

V/ m + i i-a \i/ m + r 

Suppose b > 0 is even. For i ^ 6/2 — 1, we have > 1 so that ^ 

t + 1 t + l 


t + 2 

i -f- 1 


Also 


m + t 


w -|- i -f 1 1-1-2 


^ for m ^ 1. Hence 


( fc ' 

1 1 

_ b — i m •+• x / 

\i + 1/ 

'm + i+1 

i+ltti-fi-fll 


+ i 


^ t + 2 t + 1 M 1 = //A 1 

— i + 1 i + 2 \t/ m + i \i) m + i 


For i b/2 we have —~ < 1 so that ?■ * w + li 


*•+ 1 


s+ lffl + t+ 1 


< 1 and 


(. b )—l_ 

\t ■+• 1/ m + i + 1 \i) in H 


+ t 

Thus the absolute values of the terms of the alternating series 


i(-iy(f)Ar.= 

i~o \t/m + ^ 


M 


(m + b)! (m — 1)1 


are monotone increasing as long as i £ ~ — 1, reaching maximum at i — \ and 

2 2 

then become monotone decreasing, 

Therefore (12) evidently holds for 21 ^ b/2 and 21 + 1 ^ b/2 respectively. 

For t s£ + 1 we write 
2 


E(-iyf b ^ 1 -_ — _E (-iy( b \ — 

<~o ^ \i/m + i (m -f- b)!(m — 1)! vfr+i \i/m- i 


(m -f- b) 1 (m 
bl 

(m + b) I (m 


-f i 
1 

m + b — j' 



PROBABILITY OF ARBITRARY EVENTS 


333 


From the above and the fact that 7 -—--- < - we sec that the 

(m + b )! (to — 1)> “ m + 6 

righthand side is an alternating senes whose terms are non-decreasing in absolute 
values. Hence (12) is true. 

If b is odd, the case is similar. 


3. Generalization of Frechet’s inequalities and related inequalities. Before 
proving our remaining theorems, we shall give a more detailed account of 
the general method which will be used. In the foregoing work we have al¬ 
ready given two different expressions for the function p m (l, • • • , n), namely, 
formulas (3) and (10), but they are not convenient for our later purposes. 
Formula (3) is inconvenient because it is not additive and because the p n , „/s 
are related in magnitudes; while formula (10) has gone so far in the separation 
of the additive constituents that its application raises algebraical difficulties. 
Let us therefore take an intermediate course. 

Let each ^^-combination (u l , ■ ■ • , v m ) out of (1, ■ • • , n) be written so that 

pi < Vi < ■ • • < p»i . Then we arrange them in an ordered sequence in the 
following way: the combination (m , • • , v m ) is to precede the combination 
(pi , ••• , n m ) if, for the first Vi -A fii , we have v t > /i> After such an arrange¬ 
ment we symbolically denote these combinations by 


I, II, 


-[(:)]• 


com- 


Further, all the (^^-combinations out of (m , • • • , v k ) where the latter is a ■ 
bination out of (1, ■ • • , n) are arranged in the order in which they appear in 
the sequence just written. For example, all the ^ 2 ^ _com ^i nft ^ ons 011 ^ °f 
(1, 2, 3, 4) are ordered thus: 


(12) (13) (14) (23) (24) (34). 


Let U denote a typical combination (pi, • • • , p m ). By E v we mean the com¬ 
bination of events • , Ep m so that p v = . In general, let the 

combinations C7i , ■ • • , Ub-i, l T b be given, then pu[- .oi-, v b denotes the proba¬ 
bility of the non-occurrence of Ui, ■ • ■ , Ui-i and the occurrence of Ub . 


Now let I, II, • • • , [(*) - l] = Y, [(*)] = Z denote all the (*)-< 
binations out of (m , ■ • , vk) in their assigned order. We have 


com- 


(13) p m (v i , ■ ■ • , p *,) = pi + pi'ii + Pi'ii'ui + ■ • ■ + Pi', -r'z ■ 

This fundamental formula is evident. Of course it is possible to identify the 
p’s on the right-hand side with the ordinary p y {..but we shall refrain from 
so doing and be content with the following example: 


pa(l, 2, 3, 4) = p i2 pi2<B + Pi2'a'4 + pi '23 T - pi' 33'4 + Pi'2'M • 



334; 


KA.I LA.I CHUNG 


Theorem 3. For k = 1, «* • , n — 1 and 1 £ m S k we have 

(fc _ m ) , • ■ •, n+i) ^ (fc -i- i _ m) 

Proof. Substitute (13) and a similar formula for k + 1 into the two sides 
respectively. After this substitution we observe that the number of terms is 
the same on both sides, since 

fn — m\ f n \/fc + l\_/ n — m \( n \(^\ 

\k — m) \k + 1) \ m / ~ \fc + 1 — m) \k) \m)' 

Also, the number of terms with a given U = (mi , ■ • • , ju n ) unaccented is the 
same, since 

( n — m\ / n — m \ _/ n — m \/n — m\ 
k — m/\A: + l— m) ~~ \k + l ~ m) \k — m)' 


Let the sum of all the terms with U unaccented in the two summations be 
denoted by <r fc+ i = a k+1 (m , .. • , n».) and <n = a k (mi , • • • , Mm) respectively. It 
is sufficient to prove that 


(14) 


( n — m\ . ( n — m \ 
k — m) CT * +1 ~ \k + l — m) ° k ' 


for any U. cr* contains 


n — m\ 
k — m) 


terms each of the form -p r 


•'(Ml- 


where 


0 ^ l £ fx m — m and where (m, • • • , v t , mi , • • • , Mm) 


oUt of (1, 
contains 


“ “ ( m + ,)- 


combination 


, fi m ). For fixed (mi , ■ •• , Mm) and a fixed l but varying X’s, cr*, 


n _ terms of the form . „ m , with exactly l accented 

subscripts. Let the sum of all Buch terms be denoted by Evidently v* 1 ’ 
has (*■ 7 m ) terms. As a check we have 

(n - Mm\ Aim - m\ ( n - Mm 

\k-m)\ 0 )^\k-7n- l)\ 1 ; + 

, fn ~ Hm\ffi m - m\ ^ /n - m\ 
~ r \fc - Mm/ \Mm - rn) \k - mj’ 

which is the total number of terms in <r* . 

We decompose these p’s partially, as follows: 


Pd *••»!» ~ iC P»i* 

l —o Sm-Ht. ..**, + 6 


•H+.iMl'-'l'm+S I 


where (m, 


w+«, mi , • • • , Mm+i.) is a permutation of (1, • - ■ , Mm) and where 


the second summation extends, for a fixed b, to all the (^ m ™ ^-combina¬ 

tions out of (1, ■ • ■ , Mm) — (mi , • ■ ■ , Pi , Ml > • • ■ Mm)* 



PROBABILITY OP ARBITRARY EVENTS 


335 


Now consider a given 


Ppi • p/'Xi'- ji m 


where 0 ^ t ^ Mm — m and (pi • • • p t \ i ■ ■ • X,mi ■ ■ ■ Mm) is a permutation of 


(1, ■ • • , Mm). It appears times in a k l) . Hence it appears 

(n — Mm\ / 1\ , ( n - \ / i\ . / n - p m \/t\ (n - p m + t\ 

\k- m )\o) + \k-m-l)\l) + + {k-m-t)\t) = \ k — m ) 

times in <r*,. 

Therefore to prove (14) it is sufficient to prove that 

/n — m\/n — + t\ / n — m \fn — + t\ 

\k — m) \k + 1 — mj ~ \k + 1 — m) \ k — m )' 

By an easy reduction we have 

(n — + t — k + m) g n — k 


or 

— Mm + £ + TO 0; 

since t S — m this is obvious. 

Theorem 4: For 2 sg k £ n — 1 and 1 S m ^ k we have 

, . 2Pm(n, ,v k ) ^ 1 Sp m (n, ■ • • , r*-i) , 1 Sp rt (v,, ■ ■ ■ , r*+i) 

W * /n - m\ -2 / n-m \ 2 / n-m \ ‘ 

\k — m) \k — 1 — m) \fc + l — m) 

Proof: By the reasoning in the previous proof, it is sufficient to show that 

/ n-m \ / n-m \/n - p m + t\ 

\k — 1 — m/\k + 1 — m)\ k — m ) 

/n — m\( n — m \/n — p m + t\ 

~ \k — m) \k + 1 — m) \k — 1 — m) 

/n — m\ / n — m \/n — + t\ 

' \k — mj\k — 1 — mj\k + 1 — to/’ 

for 0 S t g Mm — to. By an easy reduction this is equivalent to 

2(n — k)(n — p m + t — fc + m + 1) g (n — k + l)(n — k ) 

+ (n — p m + t — k + m + l)(n — + t — k + m) 


or 

(n — p m + t- k + m + 1) (Mm — t — to) ^ (n — k)(n m — t - to). 
For i = Mm — to we have equality, otherwise we have 

— Mm + f + TO + 1^0. 



336 


RAI LAI CHUttG 


We can deduce Theorem 3 from Theorem 4 and the following result (a case 
of generalized (lumbal inequalities): 

(15) ^ PmO-, * ' * i w) ^ Xp m (»u, * • ■ | fn~ l)> 

Pkoof of (15), Substitute from (13). Consider the p’s with U unaccented. 
The number of such terms is the same on both sides. But on the left-hand side 
they are all the same pvrr■■ iv-iy v , while those on the right-hand side, being of 
the form p y ; i> where 0 g X g ( r — 1 and (f'i, - • • , f\) is a combination 
out of (1, ■ ■ ■ , t 1 — 1), are greater than or equal to it. Hence the result. 


4. The p av .. a ,’s in terms of the p m (n, • • • , vt)’s and the p (ai . .^’s in terms 
of the pi{vi, •«■ , r*)’s. 

Theorem 5: For 1 ^ m ^ n we have 

^ Pi' "n = 52 Pm(ri , ' ' ‘ j V m ) — 52 Pm(r 11 ‘ ) ^m+l ) “H ' * * 

(s) + (-lr^a, ..., n ) 

= X ( — 1)’ X P<n(n, • • t Vm+l)* 
l-o n.•••■'«+« 

Proof: Ah in the proof of Theorem 3, consider <j k (pi , • • • , Mm). Here 
m £ k ^ n. Since a given 


( 16 ) 

appears 


times in , it appears 


/rt Mm d“ 

\ k — m J 

t (-D k -”( n r + = S' (-u'f 

\ k-m J fZo K \ 


n — Mm + 
3 


0 


= "X +< (-i) , f n_Mm + tN ) = °' 

)-o \ 3 / 1, 


if n — m« + < ^ 1, 

if n — /t« -f t = 0. 

times on the right hand side of (8). Hence for fixed (mi , ■ • ■ , p m ), the only 
p's of the form (16) which actually appears arc those with t = m« — n. But 
Mm $ n, thus t = 0, Mm = n, and (Xi ,••• X,, mi j Mm) is a permutation of 
(1, • • • , ri). The term in question is therefore pi...„ . Since the number of 

^^-combinations of (1, • • - , n) with p m = n is ^, we have the theorem. 

Theorem 6: For 1 £ r £ n — 1, we have 

Pi v ■ r] = — pi{r + 1, ■ • • , «) + X piOn ,r + l, ■•■,«) 

>1 

- X p(.vi,i> 2 ,r + 1, ■■•,«)+... + (-l^XpiOL, ••*,») 
— X ( — l)’ -1 X ’• • ,v it r-f 1, , to), 


(9) 



PROBABILITY OF ARBITRARY EVENTS 


337 


where (Vi , • ■ • , v ,) runs through all the Q j-combmahons out of (1, ■ - ■ , r ) 
Proof: We rewrite (14) for the special case m = 1, 

( 1? ) Pl(w . ■ • • , Mfc) = + PmV* + ■ • • + Pm' 4-1M , 

where mi < Ms < ■ • ■ < p* ■ Substitute into the right hand side of (9). After 
the substitution let the sum of all those p’s with y unaccented be denoted by 
<r„ . The terms in <r M are of the form where 1 g s ^ /i and 

(p i , • • ■ , p,~ i) is a combination out of (1, ■ • • , n — 1), 

First consider a fixed *x ^ r. For a fixed p„;.. M ;_ 1M we count the number of 
times it appears in , that is, on the right hand side of (9) This is evidently 
equal to 




r—; i+» 

23 (-D 





0, 

1, 


if 1, 

if r — p = 0. 


Thus the only terms that actually appear are those with p = r; and each of such 
terms p M [. p ‘_ ir appears exactly once with the sign ( — 1)“. Hence their total 
contribution is 


(18) p r . 23 Pv[t T" 23 P»hh * ' * “h ( 1) Pi'. . .(r—l)'r — Pi. r i 

n n.i'i 

by an easy modification of Poincare’s formula. 

Next consider a fixed (j i r + 1, Every term with p unaccented in a? is of 
the form (with the usual convention for p = r + 1) p M ; ( r +n' , where 

(p i, • ■ • , p,) is a combination out of (1, ■ • ■ , r); ancl it appears exactly once 
with the sign (—1)'. Their total contribution is therefore 

— P(r+1)'• • • C/i—1>'f! + 23 Pd(r+«' • (H)'m — 23 PvjWCH-1)' •(ji-lJ'd + ’ 1 ’ 

PI V 1,P 2 

+ ( —l) r ^i/.. (»+«> = — Pl..-rCr+15' , . , 0*-n Vj 


by another application of Poincare’s formula. Summing up for p = 
r + 1, we obtain 

(19) — (Pl...r(r+1) + Pi r(r4-l)' (r-f- 2 ) + ■ ■ ■ + Pi r(r+l)'. (n-l)'n)- 

Adding (18) and (19), we obtain as the sum of the right-hand side of (9) 

Pi., r — (pi. r(r+X) + Pi ■ T(r+1)'(r+2) + “• + Pi ■ .r(r+l)'•• Cn-1)'n) 

= Pi .<r+l}'(r+2)'...n' = P[1 • r] 

by an easy modification of (17). 


5. A condition for existence of systems of events associated with the proba¬ 
bilities pi(vi, * • • , v k ). 

Lemma 1 : Let any 2" — 1 quantities q{pt \, • ■ ■ , af) he given, where k = 



338 


KAI LAI CHUNG 


1, • • • , n, and for a fixed A, («i, • ■ • , af) runs through all the 


-combinations 


out of (1, ■ ■ • , n). Let the quantities Q{a\, - - * , at) be formed as follows-. 


Q(0) - l - ffd, *■■,»), 

~ ?(“*+!» ■ * ’ > “") + ?( p l> ■'*!««) 

►i 

— 12 (/On j va, at+i, * • ■ , of») + * * * + (—1)* '*3(1, • • • , n), 

O.'a 

where (vi, • • • , v.) runs through all the ^^-combinations out of (1, • • • , n) — 

(a* + i Then the sum of all these Q's is equal to 1. 

Proof: Add all these Q’s and count the number of times a fixed q(p.i , ■ • ■ , m) 
appears in the sum. For 1 g k g ?i this number is equal to 

- i+ C0-C) + - +( - i) “ , C)“°' 

Hence we have the lemma. 

Lemma 2: (Frochet) Given 2" quantities Q[ a> . .„ r i where («i , , av) runs 

through all combinations out of (1, •. • , n ) including the empty one. The necessary 
and sufficient condition that there exist systems of events E\ , • • , E n far which 


Via p .« r ] — Q[a ,.. o r ] 

(where p(oi denotes the probability for the non-occurrence of E\ , ■ • ■ , E n ) is 
that each Q ^ 0 and that their sum is equal to 1. 

Proof: Since the probabilities p are independent, i.e., unrelated in 
magnitudes except that their sum is equal to 1, the lemma is evident. 

Theorem 7; Given 2" — 1 quantities q(a i ,-••,«*) as in Lemma 1 , the neces¬ 
sary and sufficient condition that there exist systems of events L\ , • • • , E n for winch 

Pi(«i, ■ • •, «a.) = g(«i otk) 

is that for any combination (a r+ i , ■ ■■ , a n ), 1 ^ r ^ n — 1, out of (1, • ■ ■ , n) we 
have 


- q(<Xr+ i, • ■ • , a n ) + 22 q(<x n , ttr-H , • •a„) - 22 2(“n , a,,, a r +i> • " , af„) 

fl Klifl 

+ • • • + (~l) r 1 3(lj •*’>«) & 0, 

and thus 


1 - g(l, •••,«) |0. 

Proof: The condition is necessary by Theorem 6, It is sufficient by Lemma 
1, 2 and an obvious formula expressing pficn , ■ •. , a,) in terms of the £>[„,. .„ ( ]’s. 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items 


A NOTE ON SHEPPARD’S CORRECTIONS 


By Cecil C. Craig 


University of Michigan 

As far as the author is aware, H C, Carver was the first to point out that 
while the formulae ordinarily given for Sheppard's corrections for central mo¬ 
ments are valid for moments computed about the population mean, there are 
still systematic errors present when they are applied to central moments calcu¬ 
lated from any particular grouped frequency distribution [I]. This is due, of 
course to the fact that the mean of a grouped frequency distribution is in general 
different from that of the distribution before grouping For a fixed class interval 
k, Sheppard’s corrections give the average value of a moment about a fixed 
point of a given order for all the groupings of this class width possible and will 
fail to do so if the moment in question is calculated for each position of the class 
limits about a point which varies as the class limits shift. Thus Carver [1] 
pointed that the commonly used formula (for a continuous variate), 


(1) 


H 2 = v 2 — 


fc 2 

12 ’ 


should, if t »2 is calculated about the mean of the grouped distribution as it is in 
practice, be replaced by 

to\ I 2 

(2) P 2 = v 2 - 

in which <t 2 m is the variance of the means of grouped distributions over all posi¬ 
tions of the class limits with the fixed class width k 
Recently J. A. Pierce [2] gave a method for deriving the required formulae of 
the type of (2) and gave actual formulae for both moments and seminvariants 
through the sixth order. It is the purpose of this note to point out that the use 
of moment generating functions provides a more elegant and concise way of 
arriving at formulae equivalent to Pierce’s though in a somewhat different form. 
This method can be immediately extended to distributions of two or more 
variates. 

In a previous paper [3] on Sheppard’s corrections for a discrete variate, the 
author made use of the following argument ■ It is assumed that for a fixed class 
width k, any point in the scale on which the variate x is plotted is as likely to be 

339 



340 


CECIL C. CRAIG 


chosen as a class limit as any other; choosing a system of class limits for grouping 
the data is then equivalent to placing at random on the x-axis a scale with 
division points at intervals of k Once the system of class limits is chosen any 
value of x before grouping bears to the class mark, x , , of the class in which it 
falls the relation, 

(3) X, = x “b e, 


in which x and t are independent variates. The frequency law governing x, is, 
of course, that of the population from which it is drawn while < is distributed 


in a rectangular distribution with the range 


and 


(- 


ot — 1 , m — 
2m ‘ 2 m 


- kj i 




continuous variate 


if m consecutive values of a discrete variate are 


grouped in each class interval. In either case 


(4) M Xl (&) = M x (#)M.(d) 

in which M x .(&) is the moment generating function of the variate x ,, etc. The 
expansion of both sides of (4) in powers of fr gives the relations between the 
average values of moments of the grouped distribution over all positions of the 
scale and the moments of the ungrouped distribution from which Sheppard’s 
corrections are obtained by solving for the moments of the ungrouped distribu¬ 
tion. The relations are valid for any fixed point about which the moments are 
computed; if this fixed point be taken as the mean of the ungrouped distribution 
the ordinary Sheppard’s corrections for central moments result. 

But it is quite easy to modify (4) to give the necessary relations in case the 
momenta of each grouped distribution are computed about the mean of that 
distribution. We have only to write 

(6) Xi = Xi — £ -f % 

in which £ is the mean of the grouped distribution for which x< is one of the class 
marks. Then 


( 6 ) 

If we write, 


= M X( -4,s(d, w) |a_0 

= MMMM) 

X r i:* ( -a,i ~ A r » , 


in which X r « is the product seminvariant of order rs of moments about the means 
of the grouped distributions and of such means, the expansion of the logarithm 
of the second member of (Q) gives 


(7) 1 + (Xio + Xoi)i? + (X20 + 2 Xn + X02) -g- + (Xjo + Xoi) <8) g-j + ■ ■ * , 



Sheppard’s corrections 


341 


in which 

(XlO 4~ X 0 l) — XrO -|- rX r _ tll + • • ■ + X r —*,fc -)-•■•+ Xor . 

The expression of the logarithm of the right member is 3 . 


( 8 ) 


t > 2 i ? 3 

x 1 * + *. 5i + x.f i + 


+'Z(-i)* +1 ^ s 

a-l 



t? 2a 

(2i)V 


for a discrete variate (the result for a continuous variable is obtained merely by 
letting m —> °°) in which X r is the rth seminvariant of the ungrouped distribution 
and B, is the sth Bernoulli number. 

We may without loss of generality take the origin for x at the mean of the 
ungrouped distribution so that Xi = 0. Further it is easy to see that 


U = 0) r = 0, 1, 2, 3, ■ • • . 

Consider 

E[(Xi — x)x] = Vi r 

For a fixed x, i.e , for a given grouping, this becomes 

£ r E(x{ — x) = 0 

Then since Pi, is the average of this over all groupings with a given class interval, 
Pi, = 0, and from the expression for Xi, in terms of the moments P,, it is obvious 
that also Xi, = 0. 

Then we must also have Xoi = 0 as is otherwise obvious and (7) can be rewritten 
(9) 1 4~ (Xjo + Xm) — -j- (X 50 + 3 X 21 + Xm) g-j 4- • • • • 


Now from ( 8 ) and (9) by equating coefficients of like powers of t?, we get the 
set of formulae: 

Xi = 0 

e 

12 

(10) X 3 = X 30 4" 3 X 21 + 


X2 — X20 Xo 


-(*-») 


X 4 = X 40 + 4 X 31 4“ 6 X 22 + X 04 4- 


(1 - 1) It 

\ mj 120 


These formulae, however, do not give the sought Sheppard’s corrections for 
seminvariants calculated from grouped distributions of a discrete variate. See 
below. 

.Referring to formula (10), p. 58 of the author’s paper cited [3 ], it is easily seen 
by comparison that the required moment formulae are obtained from the general 
formula 


(11) 




342 


CECIL C. OH AIL 


in which a 2 , is given by formula (9) of this former paper. For n — 1, 2, 3, 4 
we write down immediately 


Mi — 0 (Put = Pot ~ 0) 

Mi = Pja + Pos — ( 1-;) 

( 12 ) Ms = •'so + 3 ?J 1 + P 03 

M< — P40 + 4Pai + (j ?22 + Pol 


£ 

12 


- (‘ - i) 


(*2Q + ?(tt) g- H“ ( 1 


\ m 2 / \ m 2 / 240 


In these formulae, Pro is, of course, the average value of r th central moments 
about the moans of grouped distributions. From the dclinition p r ,(s ^ 0) is the 
average value of the product of the r th central moment of a grouped distribution 
by the sth power of the mean of the same grouped distribution. Also, it must 
be noted that in the formulae ( 10 ) the \Js there are to be calculated by the 
usual foimulac from the moments, P , and are not themselves the average values 
of like seminvariants calculated from the separate grouped distributions. Thus 
though the formulae ( 12 ) give the sought Sheppard's corrections for moments, 
the formulae ( 10 ) do not do the like for seminvariants in general. However, 
since in each grouped distribution, 


X2 = V2 


and 


X3 — V3 


we have, taking the expectation or average value over the grouped distributions, 


and 


E(\ 2 ) — E(v 2) — P20 — X20 


E(\t) — E(v$) — P30 = X30, 


and the first two formulae of (10) do give the Sheppard’s corrections for X 3 . and 
X 3 calculated from grouped distributions of a discrete variate. 

But the case for X 4 is different. In each grouped distribution, 

X< = — 34 , 

and if we define l r by 


E(\ r ) = l r , 



Sheppard's corrections 


343 


we have 


l-i — P 40 37?(j-) 

= ?40 — 3(v 20 + S'a.fj) = X40 — 3^2 , 

if ^ 2 :i/ a is the variance of V 2 in the grouped distributions. 

In a similar way one can obtain such formulae for semmvariants as may be 
required. Through the sixth, the formulae for the Sheppard’s corrections for 
the semmvariants calculated from a grouped distribution of a discrete variate are: 

Xa = I2 + X02 — (l - z) TK 

V m 2 / 12 

Xa = h + 3X21 "b Xoa 

X 4 = U -f- 3r2.t 2 + 4 Xai + 6X22 H- X 04 + ( 1 — —- J 77^7 

(13) V mV 120 

Xt = 1 b + lOru.vj/a + 5 X 41 + IOX32 + IOX23 + Xos 
Xb = Zb + 15rn V|i>1 + 10^2 „ 2 — S0vz „ 2 90r2* 2 P20 

/ 1 \ fc 6 

+ 6Xbi + 15X42 + 2OX33 + 15X24 + ^06 — ^1 — ) 2^2* 

In these formulae, p l , , T , r , is the ijtb. central product moment of v T and v» in the 
grouped distributions. 

To illustrate these formulae numerically and to facilitate comparison with 
Pierce's results, we will use the example he chose. His ungrouped distribution 
was: 


V 

f 

y 

f 

V 

f 

1 

2 

4 

30 

7 

1 

2 

8 

5 

4 

8 

1 

3 

10 

6 

3 

9 

1 


From this the following three grouped distributions with k = 3 can be formed: 
(1) (2) (3) 


class 

f 

class 

f 

class 

f 

1- 3 

20 

0- 2 

10 

-1 hi] 

2 

4- 

37 

3- 

44 

2- 

48 

7- 

3 

6- 

5 

5- 

8 

10-12 

0 

9-11 

1 

8-10 

2 




344 


CECIL C. CItAIG 


With origin at v — 4, we have the following table of moment characteristics 
of these four distributions - 


Distribution 



1 

1 

! 


, [ m\ 

”1 

I'll 13 Aj 

V 3 *= \| 

i 

- i 

A* : 

^ “ \~6o) 

(D 

9 

9319 

17442 ! 

238849317 ' 

50338900 


19 

6!) 

60 2 

60 s 1 

60* 

60* 


60 

(2) 

9 

10179 

507102 j 

557840277 ! 

247004154 


1 

Of) 

GO 2 

00* j 

60* 

GO* 


00 

(3) 

30 

S820 

1317800 

528282000 

294904800 


20 

60 

60 2 

GO* 

00* 

60* 


60 

Average 

10 

9606 



103839990 



00 

00 l 

60 1 

60* 

60* 




1 

Mi 

MJ “ Aj 

Ms *■> X) 


Xi 


Oiiginal 

10 

7400 

Q42400 

I 305034000 

| 138079200 



Distribution 

60 

60 5 

G0» 

60* 

j 60* 




From the table, 


P20 — Xjo = 


P30 = X30 — 


9606 

00 * 

622440 
' 60* 


m _ r ,05:2 _ 441657198 

P<o — A^a + oAso —- zjL-. -■ 

0 Q 4 


We further compute: 


Hi 


2(8vl/ __ 254 _ , 

3 ~ 60 2 " 


Poj 


-380 __ . 
go' 2 


P04 


90774 
GO 4 ' 


_ 6780 . 

Pai = 3 ~ W ~ * 21 

_ _ 8705412 ? 

p ao ---A,, 

2300946 


\n — P22 — P20P0S — 


-72978 

60 4 


Xoi = P(M — 3pQ2 = 


-96774 

(i r/ 




























sheppard’s corrections 


345 


_ Sk® 2 330948 

^ ~ T “ " 20 - ~60^ 

h 

(‘ 

(‘ 

With these values one may check the formulae (12) and (13) as far as weight 
four. For example: 

_ 9606 254 2 _ 7460 

M2 60 2 + 60 2 3 60 2 

\i = ~ (163839996 + 991494 - 34821648 - 437868 - 96774 + 8640000) 
60’ 

_ 138079200 
60< 

It may appear at first glance that since 

? r . = E[v r {$Vi)‘] 

and could be expressed by means of the notation, Pi,-» ri »/, the notation in (12) 
and (13) could be made more uniform. It could be but at the expense of greater 
complexity m these two sets of results. Moreover, it is convenient that 
is expressible in terms of Pki s in precisely the same way that product semin- 
vanants are ordinarily expressible in terms of product moments. 

Pierce’s results differ from the above not only in their mode of derivation but 
also in the fact that they express Pro’s and l r ’s m terms of the characteristics of 
the ungrouped distribution and moments and seminvariants of moments in the 
grouped distributions. Thus as they stand they aie not formulae for Sheppard’s 
corrections 

Finally it must be remarked that in comparison with the usual formulae for 
Sheppard’s corrections, the formulae (10) and (13) introduce quantities the 
magnitudes of which are not known in general except that ordinarily they are 
quite small. It is hoped that results on this point will be forthcoming soon. 

REFERENCES 

[1] H. C Carver, "The fundamental nature and proof of Sheppard’s adjustments,” 

Annals of Math. Slat , Vol 7 (1930), pp 154-163. 

[2] J A. Pierce, "Astudy of a universe of n finite populations with application to moment- 

function adjustments foi grouped data,” Annals of Math. Slat., Vol 11 (1940), 
pp. 311-334. 

[3] C C Craig, “Sheppard’s corrections for a discrete variable,” Annals of Math Slat,, 

Vol. 7 (1936), pp. 55-61. 


— — 3^2 v. = v ia — Zv\a — 3v2. v , = 


1 ^ = 2 
mV 12 3 

s *)( 7 


163839996 

60 4 


_3 

rri 1 


k 

_L = 2. 

240 



346 


ABRAHAM WALD 


ON THE ANALYSIS OF VARIANCE IN CASE OF MULTIPLE 
CLASSIFICATIONS WITH UNEQUAL CLASS FREQUENCIES 

By Abraham Wald 1 
Columbia University 

In a previous paper 2 the author considered the case of a single criterion of 
classification with unequal class frequencies and derived confidence limits for 
cr /2 /cr 2 where <r /S denotes the variance associated with the classification, and cr 2 
denotes the residual variance, The scope of the present paper is to extend 
those results to the ease of multiple classifications with unequal class frequencies. 

For the sake of simplicity of notations we will derive the required confidence 
limits m the case of a two-way classification, the extension to multiple classifica¬ 
tions being obvious. 

Consider a two-way classification with p rows and q columns. Let y be the 
observed variable, and let n lt be the number of observations in the 4th row and 
jth column. Denote by y[f the &lh observation on y in the ith row and jth 
column (k = 1, ••• ,m,). Let the total number of observations be N. We 
order the N observations and let y a be the ath observation on y in that order. 
Consider the variables: 

t, ti , • • • , tp i Vi , • ■ • ,l/j| 

and denote by t a the ath obsei'vation on t, by t, a the ath observation on U and 
by v,- a the ath observation on v,. The values of l a , U a and v ja are defined as 
follows: 

f« = 1 (a = 1, •.. , N), 

Uc = 1 if 2/ a lies in the ith row, 

£,« = 0 if j/a does not lie in the ith row, 

Via = 1 if 2/« Hes in the jth column, 

Vj a = 0 if 2 /« does not lie in the jth column. 

We make the assumptions 

11 if — xif -f e> -f- rj, , 

where the variates a, Vj {i - I, ... , p; j = 1, = 1, — , n<,) 

are independently and normally distributed, the variance of is cr 2 , the vari¬ 
ance of is a' 2 , the variance of y,- is c" 2 , and the mean values of e,- and y ,■ are 

zero. 


1 Research under a grant-in-aid from the Carnegie Corporation of New York 
1 "A note on the analysis of variance with unequal class frequencies,” Annals of Math. 
Stat., Vol. 11 (1940), 



ANALYSIS OF VARIANCE 


347 


Let the sample regression of y on t, t x , .. . , , vi, .. . , v 9 -i be 

Y = at 4- bih + ■ ■ ■ + + diVi + • • • + , 

We want to derive confidence limits for 

a /a =X . 

Let us introduce the notations: 


<x 

= 



0 = 

i, 

••• > v - 

-1), 

^ / t'aftja 

= 

Q’Ojj— l-(-j 


(i = 

1 , 


-1), 

E u,« 

= 

a„ 


0, J = 

1 , 

■■■ »p - 

-1), 


= 

l+J 

(* “ L • • • , P 

- 1 ; j = 

i, 

••• 1 s - 

-1), 


= 

ftp—l*ft J)—1+J 


0, j = 

L 

•••!?- 

-1), 

II 

Ci, 

II = l|a.,r 

O', j 

= 0 , 1 , • 

■ • t 

P + 3 - 

- 2 ). 


Let the regression of x{) ] on t, h , * • , t p -i, vi, •.. , v q -i be 

X = a*t + + • • • + bp^itp-i + diVi + • • ■ + . 

The regression of «, + ?/, on the same independent variables is evidently equal to 
eiti + • • • + *ptj> + Vl v l + • • • + VqVq 

= (’I® 4" e p )£ + (ei — + ■ • • + (cp-i ~ e p )tp-i 

+ (vi — Vq)vi + • • • + (vq~i — Vq)Vq-l ) 

since t p = t — ti — • • • — t p -1 and v q = t — vi — ■ • ■ — i>„_i. Hence 
(1) b t = bt + (e, - e p ), (i = 1, ■ • • , p - 1), 

and therefore 

= [c» 3 + (1 + 5,,)X s ](r 3 , (*, j = 1, • • • , p — 1), 

where 6», is the Kronecker delta, i.e. 5 t , = 0 for i ^ j and 8„‘ = 1. Denote 

c,, + (1 + 5,,)X 2 by c, 3 . Since the expected value of bt is equal to zero, on 

account of (1) also the expected value of bi is equal to zero. Let 

\\9m II = II Ip 1 , (i, j = 1, • • • , p - 1). 

Then 


(3) 


H J )—1 P -1 

- s E £ oM 

a 1 - i_i ,_i 



348 


ABRAHAM WALD 


has the ^-distribution with p ~ 1 degrees of freedom. The expression 
(4) ~ Z (Va - Y a y, 

& «pm1 

has the x 2 -distribution N — p — q -f 1 degrees of freedom. The expressions 
(3) and (4) are independently distributed. Hence 

/r\ N V 5 4“ 1 

has the F-distribution (analysis of variance distribution). We will now show 
that (5) is a monotonic function of X 2 . It is known that 22jp,b,f>, is invariant 
under linear transformations, i.e, 

ss= Wo'ob'M , 

where b[ is an arbitrary linear function, say an&i ■+■•»»+ M lp _ib p _i of &i, ... , 
b^i (i = 1 , .. • , p - 1 ) and 

ii f'. ii = ii^fir 1 * 

We can choose the matrix || II such that 

~ «j») + * * • + Wp-— <p), (i — 1 , * • • , p — 1 ), 

are independently distributed and aj; = <r' a . The coefficients p,,- of course do 
not depend on <r'. We have 

= o'bi'bY + o.-ycr' 2 , (5„ = Kronecker delta). 

Now let 

b< ~ rubi + • • • + rtp-ibp-i, (r = 1 , • • • , p — 1 ), 

where || || is an orthogonal matrix and is chosen such that b* ", ■ ■ • , b*-i 

are independently distributed. On account of the orthogonality of || rv, || we 
obviously have 


Hence 

( 6 ) 


<Tb'i' = o -2 ;" + o-' 2 ; ffi'/b” =0 for i ^ j. 

p-i 7 /'j 

Z Z aubibj = Z ——rn- 

<-i cr 6 J" + XV 


The right hand side of (G) is evidently a monotonic function of X 2 which proves 
our statement. The endpoints of the confidence interval for X s are the roots 
in X 2 of the equations 

(7) N-p-q + l 22 < 7 ,, bi b, N-p-q + 1 22g if Kbj _ p 

v - X s(y. - 7«) 2 * ’ P - 1 ' S(y a - IV ) 2 

where F 2 denotes the upper, and Fi the lower critical value of F 



ANALYSIS OF VARIANCE 


349 


The derivation of the required confidence limits in case of classifications in 
more than two ways can be carried out in the same way and I shall merely state 
here the results. 

Consider r cnterions of classifications and denote by p u the number of classes 
in the nth classification ( u = 1, • - • , r). Denote by n H . , r the number of 
observations which belong to the fjth class of the first classification, i 2 th class 
of the second classification, • ■ ■ , and to the vth class of the rth classification. 
Let i/f*?. . lr be the kth observation on y in the set of observations belonging to 
the classes mentioned above (k = 1, .. • , n tl . ., r ). We make the assumption 


ar*r ,M r ' 


where the variates 


x^--( r , e,^*, • ■ • , Hu — 1, • • • , Vu i u — 1, • * • , r; k = 1, • • • , n ti . 

are independently and normally distributed, the variance of is a 2 , the 

variance of is and the mean value of is zero {i u = 1 , ■ • • , p u ; 
u — 1 , •' • , t). 

Let N be the total number of observations. We order the observations in a 
certain order and denote by y a the ath observation in that order (a — 1, ■ ■ • , N). 
Consider the variables: 


ti ti u > (u — 1, • ■ < , Tj i u — 1, • ■ • i Pu )> 

and denote by t a the ath observation on t and by tl^a the ath observation on 
lj“\ The values of t a and f, ( “2 are given as follows: 


ta = 1 (« = 1, ••• ,N), 

= 1 if y a lies in the i u th class of the wth classification, 
ti"l = 0 if y a does not lie in the i u th class of the uth classification. 

Let the sample regression of y on t, fj"* be given by 

F = at+ £ ^ bif ■ 

U—1 

Let the covariance of b^’ and b { ,'f be given by C^,\a under the assumption 
that or = <r 2 = • • = 0 r = 0. The matrix || || (i„, = 1, • • ■ , p u — 1) 

can be calculated by known methods of the theory of least squares. Let 

ii oil 11 = 11 on + a+ Kiu)i ir Hu,k = i, • • •, Pu - 1 ), 

where 5, u , u is the Kronecker delta and Xt = d/V Then the lower and upper 
confidence limits for X 2 t are given by the roots in X 2 of the equations 




350 


T. N. E, GREVILLE 


where F 2 is the upper and 1<\ the lowei critical value of the analysis of variance 

r 

distribution with p v — 1 and N — T~1 p„ -hr — 1 degrees of freedom. In 

U»1 

case of a single criterion of classification the confidence limits (8) are identical 
with those given m my previous paper. 


THE FREQUENCY DISTRIBUTION OF A GENERAL MATCHING 

PROBLEM 


By T. N. E. Greville 
Bureau of the Census 

1. Introduction. This paper considers the matching of two decks of cards of 
arbitrary composition, and the complete frequency distribution of correct 
matchings is obtained, thus solving a problem proposed by Stevens. 1 It is also 
shown that the results can be interpreted in terms of a contingency table. 

Generalizing a problem considered by Greenwood, 2 let us consider the matching 
of two decks of cards consisting of t distinct kinds, all the cards of each kind being 
identical. The first or “call” deck will be composed of h cards of the first kind, 
in of the second, etc., such that 

i'i *(■ i + ij • • • + it — n\ 

and the second or “target” deck will contain j\ cards of the first kind, of the 
second, etc., such that 

Ji + • + ji — n. 

Any of the i's or f s may be zero. It is desired to calculate, for a given arrange¬ 
ment of the “call” deck, the number of possible arrangements of the “target” 
deck which will produce exactly r matchings between them (r = 0, 1, 2, ■ - ■ , n). 
It is clear that these frequencies are independent of the arrangement of the call 
deck. For convenience the call deck may be thought of as arranged so that all 
the cards of the first kind come first, followed by all those of the second kind, 
and so on. 


2. Formulae for the frequencies. Let us consider the number of arrange¬ 
ments of the target deck which will match the cards in the kith, kith, ■ • • , fc.th 
positions in the call deck, regardless of whether or not matchings occur elsewhere. 
Let the cards in these s positions in the call deck consist of ci of the first kind, 
C'i of the second, etc. Then: 

ci + <h + • • • + c t ~ s. t 

The number of such arrangements of the target deck is 



(n — s)l 

II (jh — Ch) 1 

h -1 


1 W. L Stevens, Annals of Eugenics, Yol, 8 (1937), pp. 238-244. 

1 J. A. Greenwood, Annals of Math. Stat , Vol 9 (1938), pp. 56-69 



A MATCHING PROBLEM 


351 


For fixed values of tlie c’s, the s specified positions may be selected in 


( 2 ) 


t • i 

n_—_ 

h-i Chbjh — C h ) ’ 


ways. 

Consider now the expression 


(3) 


F. 



t 


(n - s)! II »;.! 

/i=i 


n Ck'(4 - Ch) Kjh - Ch )! 

h -1 


obtained by summing the pioduct of (1) and (2) over all sets of values of the 
numbers Ci, c 2 • ■ • , c t satisfying the conditions: 

t 

0 ^ c k ^ 4, c h ^ ji„ and £ c h = s, 

h-l 

Let W> denote the number of arrangements of the target deck which result m 
exactly s matchings Then it is evident that V s exceeds W, , since the former 
includes those arrangements which give more than s matchings, and these, 
moreover, ( are counted more than once, Consider an arrangement which 
produces u matchings, where u > s. Such an arrangement will be counted 
once in V, for every set of s matchings which can be selected from the total of 
u —that is U C, times. In other words, 

Vr = Wr + r+l C T W r+ l + r+ UF, +2 + . . . + n C T W n . 

It has been shown 3 that the solution of these equations is 


( 4 ) W r = V r ~ ,+1 CrV T +l + '"CrVr+Z -+ (-l)*"' "CrVn- 


3. Computation of the frequencies. Equations (3) and (4) apparently give 
the solution of the problem, but in practice the labor of carrying out the sum¬ 
mation indicated in (3) would often be very great. However, (3) may be re¬ 
written in the form 


(5) 


where 


V. = (7 \ 

IUI 


h. = e n 


41 jh ! 


h-i C/,1 {lh — Ch) ! {jh — Ch) 1 


3 H Gei ringer, Anruils of Math Slat., Vol 9 (1938), p 262. 



352 


T. N. E. GREVILLE 


It will be seen that H, is the coefficient of x‘ m the product 

ft! (in - fc)!(|j* -ft)!/’ 

l 

where i'h denotes the smaller of n and j h . The factor JJ ! was included in 

h-i 

II, in order to make the coefficients in the polynomials of (6) always integers. 
Equation (4) may now be written in the form 



W r = X (-1)~ *C f C -X_il ! 

n*i 


or 

(7) 


W r = 


lv(-D 1_r si (» - s)\ „ 

rui ' 


a form which lends itself to actual computation. 


4. Factorial moments. The factorial moments of the frequency distribution 
of the number of matchings are easy to compute. Let m, denote the stli factorial 
moment, so that 


( 8 ) 


Substituting from (4) 


X r (,) W r 

r»*a _ 

X Wr 


X r u 'Wr » X (r (,) X (- 0°" r u C r Fu) . 

rn r«»<8 ^ u*»r J 

Reversing the order of summation and simplifying, 

X r U) Wr = X (w (,) Fu X (-1)- Cr-\ = s! F,. 


Hence, 



and from (5) and (8), 

( 10 ) 


A matching problem 


353 



5. Mean and variance. From (6) 

(ID Hx = X 4j* 

ft -1 

and 


( 12 ) 


t t 

Hi = \ X 4(4 — 1 )40* — 1) + X ihikjhjk ■ 

A—1 /j, A.=1 


Hence the mean number of matchings is 


(13) 


mi 



The variance m is 


W 2 + m x — mi — 


n-{n 


if 1 1 

-tv n X 4(4 - 1)40* ~ 1) + 2n Z 44,7*4 

“ 1) L *-l A.ft-1 

+ n(n — 1) X) 44 ~ (w - 1)(X 44 J » 

A-l \*-i / _ 


or 


(14) 


M 2 = 


1 


n 2 (n — 1) 


/ 


A -1 


) 2 < < 

— n X 44(4 + 4) + n 2 X 

*-l A=1 


Ihjh 


In the special case 4 = 4 = • • ■ = 4 = 4 these formulae become 

These formulae have previously been given by Stevens, 4 and those for the 
special case also by Greenwood. The maximal conditions for the variance, 
given by Greenwood for this particular case, apparently can not be put in a simple 
form for the general case 


6. Unequal decks. Suppose the call deck contains m cards, m < n, and is to 
be matched with m cards selected from the target deck It can be assumed 
without loss of generality that the first to cards in any arrangement of the target 
deck arc the ones to be used. The formulae of this paper can be applied to this 


* W L, Stevens, Annals of Eugenics, loc cit , Psychol. Review, Vol. 46 (1939), pp. 142-150. 




354 


PAUL 0. HOEL 


more general problem by the expedient of imagining n — m blank cards to be 
added at the end of the call deck and regarding these as an additional kind, 
It is thus apparent that formulae (13) and (14) apply without modification to 
this altered situation 


7. Application to contingency table. Stevens 5 has considered the distribution 
of entries in a contingency table with fixed marginal totals, and has pointed out 
that the problem of matching two decks of cards may he dealt with from that 
standpoint A contingency table classifies data into n columns and tn rows, 
and we may consider the row as indicating the kind of card which occupies 
a given position m the call deck, the columns having the same function with 
respect to the targeL dock, Stevens defines a quantity c as the Mini of entries 
in a proscribed set of cells, subject to the condition that no two cells of the set 
are m the same row or column, and mentions as unsolved the problem of the 
exact sampling distribution of c, 

We now have at our disposal the machinery for solving this problem, Fol¬ 
lowing Stevens's notation, let «i, ai, •• ■ , a,» denote the fixed low totals and 
bi,bt, • •• ,b n the fixed column totals, while x„ denotes the frequency of the 

cell in the rth row and the sth column. Then, let c = 53 , whore l does 

/v««l 


not exceed either m or n. Imagine two decks of N cards ( N — 53 m = 53 hu 

\ h-Y A-1 

the first containing a L cards of one kind, a? of another, etc., and the second 
containing hi cards of one kind, b 2 of another, etc Moreover, let the nth kind 
m the first deck and the nth kind in the second deck be the same kind (h — 
1, 2, ■ • • , l), the other kinds being all diffeient. Evidently c is the number of 
matchings between the two decks Hence, the methods of this paper can be 
used to obtain the distribution of c The formulae we have obtained agree with 
those for the expected value and variance of c given by Stevens. 


>hj, 


ON METHODS OF SOLVING NORMAL EQUATIONS 

By Paul G. Boel 
University of California, Los Angeles 

There seems to lie considerable disagreement concerning what is the most 
satisfactory method of solving a set of normal equations Since such informa¬ 
tion as errors of estimate, and significance of results is usually desired in addition 
to the solution, in its broader aspects the problem is one of deciding what is the 
most satisfactory method of calculating the inverse of a symmetric matrix, 

For equations with several unknowns some compact systematic method of 


s \V. L. Stevens, Annals of Eugenics , loc, cit. 




SOLVING NORMAL EQUATIONS 


355 


calculation is necessary to eliminate much of the labor involved in the ordinary 
method of calculating the inverse from its definition. Among the more common 
of such systematic methods are those associated with the names of Chio, 1 Gauss, 1 
Doolittle, 2 and Aitken. 3 In addition, A. A Albert 4 recently called attention 
to a method implicit in elementary matrix theory There are also various 
iterative schemes, and schemes which are but slight variations of the above 
methods In this note only the methods associated with the above names will 
be considered, and for convenience they will be labeled with those names, regard¬ 
less of who should be given credit for them. 

The purpose of this note is to show that when the calculation of the inverse is 
systematized, ail of the above methods are fundamentally equivalent and merely 
involve a different arrangement of work Consequently, any advantage m calcu¬ 
lating time foi any particular method will arise through such features as a 
simpler technique or less copying, rather than through fewer multiplications and 
divisions 

By the method of Ohio is meant the evaluation of determinants by the pivotal 
method of reduction. Since all of the methods mentioned above use pivotal 
reduction, the method of Chio will not be treated as a distinct method Fur¬ 
thermore, since Gauss’ method is incorporated in that of Aitken, it will be neces¬ 
sary to consider only the methods of Aitken, Doolittle, and Albert as distinct. 

First consider the method of Albert, which is based on the following matrix 
properties. Let the matrix A be subjected to a sequence of row transformations 
leading to the matrix A'. Then, writing A = IA, it follows from a theorem in 
matrix theory that A' = I'A, and consequently that A AT 1 = I’. If row trans- 
foimations are chosen which make A' = I, then A -1 = I' This states 
that if the same row transformations are applied to the identity matrix as were 
used to reduce A to the identity matrix, then the resulting matrix will be the 
desired inverse. The customary manner of reducing A to I is to work for zeros 
in columns as follows 1 



dl2 

• • • &ln 

«2L 

a 22 

• 

• * &2 n 

C^nl 

• 

Clm 

• » ■ Ginn 



1 See, for example, Whittaker and Robinson, The Calculus of Observations, p 71 and p. 
234 

2 See, for example, Or ox ton and Gowden, Applied General Statistics, 1939, p, 716, 

3 Roy. Soc Edm Proc , Vol. 57 (1936-37), p. 172 

4 Am. Math Monthly, Vol 48, No 3 (1941), p 198 



356 


PAUL G. HOEL 


a ii &22 


1 ^ 
an 

0 1 
0 0 


Ctl 3 



din 

All 



dll 

baa 



bl n 

bn 



b 22 

bsi — bu 

622/ 


— bin 

bni — bus 

b n j\ 
bn / 

... 

^' bnn bin 


bn 2 
&22 


where new letters are introduced for new elements after each reduction. After 
zeros are obtained below the main diagonal, zeros are obtained above the 
diagonal by starting with the last column. If now these operations are per¬ 
formed in the same order on I, the result will be A -1 . 

Next consider the method of Aitken, which is based on the evaluation of a 
bordered determinant, namely, 


an • • • aij ■ ■ • Oin 0 

«•. a,,' *»• ci{n 1 

• * * • 

a„i * • • a n j ■ • • a„ n 0 

0 ... -1 ... 0 0 


cofactor of aa. 


To obtain A -1 it is merely necessary to evaluate determinants of this type and 
divide them by | A |. Aitken’s method evaluates all such determinants simulta¬ 
neously, using Ohio’s reduction technique in much the same manner as illustrated 
above with Albert’s method. Thus, 


dn 

0 13 . . 

■ fflln 

1 

0 ... 

0 

an 

$22 * • 

• din 

0 

1 ... 

0 

a„t 

$n2 • • 

• a nn 

0 

0 ... 

1 

-1 

0 

. 0 

0 

0 ... 

0 

0 

-1 .. 

• 0 

0 

0 ... 

0 

0 

0 .. 

• -1 

0 

0 ... 

0 



SOLVING NORMAL EQUATIONS 


357 



When zeros arc obtained below the main diagonal to the left of the vertical 
dividing lino, the matrix m the lower right section will be A -1 . This follows fioni 
the fact that the elements of this matrix will be the evaluations of bordered 
determinants, like those of the previous paragraph, divided by a n b 2 i ■ ■ ■ = | A | 
It will be observed that the operations on A in Albert's method which produce 
zeros below the main diagonal are the same as those which occur above the hori¬ 
zontal dividing line in Aitken’s method. This set of operations is performed 
simultaneously on I, since the upper right section of Aitken's scheme w I Fur¬ 
thermore, obtaining a zero for an element below the horizontal line mid to the 
left of the vertical line, is equivalent to obtaining a zero for the element corre- 



35S 


PAUL G. IIOUL 


spnudiiig to tin 1 .same row and column in lint -cction above the horizontal, pro¬ 
vided the preceding columns contain zero-- above the diagonal. Hut obtaining 
Zeuis above the mum diagonal ol A eoii'-tifute 1 - the* second eel of operations in 
Albert’s inetliud to obtain A' - I Tims, the operations in Aitlcen’.s method 
which produce steins in a given column for elements above the horizontal lino 
are. merely the lirst, set of operations in Albert's method, while those which 
produce zeros below the horizontal line arc the second set, of opeiations in reverse 
order Since, in Aitken’s scheme, the first set of operations is performed on I 
ill the upper right section and the results are transferred a row at a time to tho 
lower right section, where they are in turn operated upon by Hie second se.t of 
operations, this lower right section is merely I operated upon by the entire set 
of operations of Albert’s method Consequently, Ailkon’s and Albert’s methods 
are the same except for the order in vhieh operations are performed and differ¬ 
ences arising therefrom. Since. Aitken’s method performs these operations more 
compactly, it is to be pioferml to that of Albert, 

Xext consider the method of Doolittle, which is desciibed by following 
the instructions given in the fust column in tlio table shown on page HIS. 
The forward solution is completed after n such sectional operations. For a 
given k column, the backward solution is obtained as usual by substitution in 
the Inst, row of each section taken in roveiso order, 

17 all summations in each section ate performed in pairs and the sums recorded 
each time, rathci than being performed in one operation, the forward solution 
of tho Doolittle method will lie found to be a rearrangement of the work occuiring 
above the horizontal line in Aitken’s method. Thus the lirst linos of each 
section give the matrix above the horizon!al line in Aitken’s scheme. Then, 
except for signs, I' and the sums of the first two lines of the remaining sections 
give the result of Aitken’s first sequence of operations above the horizontal. 
Then, except for signs, IV and the sums of the first three lines of the remaining 
sections give the result of Aitken’s second sequence of operations above the 
horizontal, etc. 

The back solution involves precisely the same, operations as those making np 
the second set of Albert’s sequence of operations to obtain zeros above the main 
diagonal, Since these were shown to be a- rearrangement of operations in 
Aitken’s method, it follows that the methods of Ailken and Doolittle are the 
same except for the, order of operations and differences arising therefrom. Hence 
all throe methods are basically the same when systematized for a calculating 
machine. 

Because of this equivalence, the number of necessary multiplications and 
divisions null be the same for all three methods, and will be found to be 
ln(n + 1). Since Aitken’s method is to be preferred to that of Albert, it will 
suffice to compare the methods of Aitken and Doolittle for calculating con¬ 
venience, 

The Doolittle method possesses several distinct advantages. First, its multi¬ 
plications occur a row at a time with one of the factors constant for that row; 
consequently the keyboard remains unchanged for a given row of operations. 



SOLVING NORMAL EQUATIONS 


359 



1 

2 

3 

n 

fcl 

fc. 


K 

I and SI 

an 

ai 2 

an 

aiTi 

-1 

0 


0 

I' 

-1 

__ 

an 

an 

(Zjl 

fflTl 

«u 

1 

aii 

(1 


0 

11 

aji 

a 22 

<123 

’ ttSh 

<J 

-1 


0 

SI I's 

— (Il2 

a 12 

—«12 " 
an 

till 

—an — 
an 

an 

4 * — CLln — 

an 

a i2 

an 

0 


0 

SII 

0 

( a,, A 

I an — a u - 
\ an/ 

( ay\ 

1 < 12,1 - an — 1 

\ a n/ 

( 

l din — ain 1 

V an/ 

an 

an 

-1 


0 

II' 

11 

-1 

l>a 
~ foil 

bin 

bz 2 

OlS 

(Ill ^22 

1 

622 


0 

III 

( 1 31 

a 32 

<113 

' dan 

0 

(1 


0 

sir, 

-Chi 

«13 

—an - 
an 

an 
— a, i ’— 

«n 

an 

— CLi n — 
an 

an 

an 

(1 


0 

SII-II', 

0 

—bn 

, b 2i 

~ bi T a 

, bn 

• • —bin : 

<>t 2 

(In 6s 

an 622 

bt 23 

Vu 

■ * 

0 

Sill 

u 

0 

C .,3 

• ’ Cm 

('in 11 

bzi 

biz 


0 

III' 

0 

0 

-1 

Cin 

ca»+i 

bn 


0 

CiB 

Cu 

62 


• 




• 

* 


Aitken’s method, however, consists of calculating successive cross products’ 
which requires clearing of the keyboard after each such operation. Secondly, 
there are fewer additions in the Doolittle method. It sums i quantities at a 
time in section i, while Aitken's cross products always involve the sum of two 
quantities. Because of the necessity of calculating the complements of negative 
sums, this difference becomes important when the number of vaiiables is large. 
A third feature m favor of the Doolittle method is the ease of peifonning the 
calculations without previous experience It may be easier to understand how 
to calculate cross products, but actually the calculations of the Doolittle method 
are easier to perform. Aitken’s method requires some experience with it, if one 
is to avoid repeating certain calculations which would result from calculating all 
cross products mechanically. The comparative amount of copying in the two 
methods depends upon the number of variables involved. 

From the above considerations, it may be concluded that the Doolittle method 
is to be preferred among those considered in this paper for solving a set of normal 
equations or calculating the inverse of a symmetric matrix However, if a 
single calculating technique is desired which can bo used for nonsymmetrical 
equations as well, then the method of Aitken is to be preferred. 




360 


PAUL A. 8AMUELSON 


CONDITIONS THAT THE ROOTS OF A POLYNOMIAL BE LESS THAN 
UNITY IN ABSOLUTE VALUE 

By Paul A. Samuelson 

Massachusetts Institute of Technology 

1. Introduction. In econometric business cycle analyfiin, probability the¬ 
ory, and numerical mathematical computation the problem of convergence of 
repeated iterations arises. The solution of the difference equations defining 
such a process can in a wide variety of cases be shown to be stable in the sense of 
converging to a limit if a certain, associated polynomial 

(1) fix) - pa" 4- pix n ~ l 4- • • • + P» = 0, 

has roots whose moduli are all less than unity. 

Thus, for “timeless” linear difference equation systems of the most general 
type, convertible into normal form, 

(2) Q,(t + 1) = £ <k,Q,(L), « - 1, . •, , n), 

i-i 

the polynomial is the characteristic or determinantal equation, 

(3) fix) * | an - xSt, | = 0, 

which when expanded out is of the form (1). The roots of this equation, when 
multiplied by suitable polynomials in t, give the exact solution of the problem 
in the form 

(4) Qit) = f, gM, 

i-i 

where m is the number of distinct roots, and the g ‘s are polynomials of degree 
one less than the multiplicity of the respective root. If complex roots occur, 
they do so in conjugate pairs and can be combined to form damped, undamped, 
or anti-damped harmonic terms All terms go to zero as t approaches infinity 
if, and only if, the absolute value of each x is less than unity. 

For non-linear systems the exact solution does not take this form, but in the 
neighborhood of an equilibrium point the roots of an associated polynomial, 
except in singular cases, do determine the stability of the system. 

As far as the writer is aware, there does not appear in the literature an account 
of necessary and sufficient conditions for the roots of a polynomial to be less than 
unity in absolute value. This is in contrast to a related problem which arises 
in connection with the investigation of stability of dynamical systems defined by 
differential equations. These have associated with them a polynomial whose 
roots provide solutions in the form 

(5) Qiit)e Xii , 



HOOTS OF A POLYNOMIAL 


361 


or for non-linear systems infinite power senes m such terms It is required, 
therefore, to determine complete conditions under which, the real parts of all 
roots must he negative 

This problem has been solved by Routh 1 in a manner which leaves little to be 
desired. Determinantal expression of his conditions in a slightly modified form 
was made by Hurwitz 2 who apparently was unaware of Routh's work, and by 
Frazer and Duncan" who were unaware of the Hurwitz results. A bnef outline 
of Routh’s mode of attack will prove instructive in dealing with the problem 
at hand. 


2. Routhian analysis of sign of real parts of roots. Routh realized 
that the condition that all coefficients be positive—the leading coefficient having 
been made so—was necessary, but not sufficient unless all the roots were real. 
But a "derived” equation of degree n(n — l)/2 whose roots equal the sums of the 
roots of the original equation taken two at a time has real roots which are simple 
sums of the real parts of those of the original equation. In consequence, it is 
necessary and sufficient that the coefficients of the original and the “derived” 
equation all be positive. 

Thus, valid necessary and sufficient conditions are presented. However, they 
are disadvantageous from two points of view. First, they are not all independ¬ 
ent, being n(n + l)/2 conditions in number, whereas only n are necessary. Sec¬ 
ondly, despite several ingenious methods devised by Routh, it is not easy to 
compute them m the general case. 

Recognizing these difficulties, he therefore began anew from an entirely 
different angle Utilizing a theorem of Cauchy concerning the relationship 
between the behavior of a polynomial on a closed contour in the complex domain 
and the number of roots within that closed curve, he derived necessary and suffi¬ 
cient conditions, which may be written in the slightly more convenient deter¬ 
minantal form of Hurwitz and Frazer and Duncan as follows. 


( 6 ) 


T a = Pa > 0; Ti = pi > 0, 


t 2 


Pi 

Po Pi 


> 0 , 







Pi 

Po ■ ■ 

■ P2,-l 

Pi 

Pa 

Pe 



Po 

•• 

• Pls-1 

Po 

Vi 

Pi 

> 0, 

• T. = 

0 

Pi • ■ 

• Ple-t 

0 

Pi 

Pa 



0 

po • 







0 

0 .. 

• P* 


1 1C, J. Routh, A Treatise on the Stability of a Given Stale of Motion, (London, 1877), 
Chapa, 2 and 3, Advanced Rigid Dynamics, 6th ed , London, 1905, Chap. 6 
3 Hurwitz, Math Ann , Vol. 46 (1895), p 521 

3 R A. Frazer and W. .J. Duncan, Royal Soc Proc , Senes A, Vol 124 (1929), p. 642. 
Also R A Fiazor, W J Duncan, and A R. Collar, Elementary Matrices, Cambridge Uni¬ 
versity Press, 1938, pp 151-155. 





362 


PAUL A. BAMUEL60N 


The law of formation of these determinants is obvious. In the first row the 
odd p’s starting with the first are listed. Within each column the ?i\s diminish 
one unit at a time. Any p with negative subscript derived by this formula is 
treated as zero, and all p’s of subscript higher than the degree of the equation 
are set equal to zero. With this convention, for pa made positive, complete and 
independent necessary conditions are that all principal minors of T n formed by 
deleting successively the last row and column must be, positive. 'These condi¬ 
tions are n in number and are independent. 


3. Complete, independent, necessary and sufficient conditions. Corre¬ 
sponding to Roulh’s first attack cm the problem, we might consider an equation 
of degree n(n — l)/2 whose roots equal the products two at a time of the original 
equation's. If this equation and the original equation have real roots less than 
unity in absolute value, our problem is solved. This is guaranteed if, and only 
if, two further transformed equations with roots equal to the squares minus unity 
of the roots of the original and derived equations respectively all have positive 
coefficients. These conditions are necessary and sufficient, but not independent, 
and cannot be easily computed in the general case. Therefore, I follow Routh’s 
example and approach the problem from a different point of view. 

When the roots of f(x) = 0 are plotted in the complex plane, they must all lie 
within the unit circle if their absolute values are to be leas than unity, and con¬ 
versely. We might therefore attempt to apply Cauchy’s theorem. However, 
it is not necessary to do so, Routh has shown what the conditions are that there 
be no roots in the right-hand half-plane. Can we find a complex transformation 
of variables which carries the unit circle into the left-hand half-plane? 

The answer is in the affirmative. The linear complex transformation 


(7) 


+ 

- i’ 


z = 


x + 1 
x — 1 


will accomplish this. But after substituting for x its value in terms of z, we 
cease to have a polynomial but rather a rational function of z as follows: 


( 8 ) 


u , lX E viz + ir-(z - u* 

M = /(Etij = 1=5- 7 -- = 0. 


(z - I)" 

We need only consider the polynomial in the numerator, i.c,, 


(9) 


<p{z) 


E 2' 
0 


n—i ^ 


o. 


In order that the, roots of the original equation be less than unity, in absolute, value , 
it is necessary and sufficient that the real parts of the roots of equation (9) be negative. 
Once wc determine the coefficients (ir,) in terms of the original p's, we can easily 
apply Routh's theorems. This yields n 4- 1 necessary and sufficient conditions, 
all of which are independent. 



BOOTS OF A POLYNOMIAL 


363 


Expanding the numerator of the right-hand side of (8) and collecting terms, 
the following explicit formulas for the 7r’s are directly obtained; 


( 10 ) 

where 


n m(ill ) 

TTi =.£ Vi n— iCi— fc( 1) j Cfc, 

j-0 k =0 




id 


(v — w )! w 1 ’ 


and 


m(i, j) = the smaller of i and j. 

For fourth and higher degree equations literal substitution, while always 
possible, results in complicated expressions It is preferable, therefore, to com¬ 
pute the it’s numerically and then apply the conditions of (6) directly. 

Other necessary conditions can be easily derived, but they will be dependent 
upon these. Thus, each r must be positive; but this is not, by itself, sufficient. 
Or, adding n and ir„ we find 


( 11 ) 


n + = 'p a + Pi 4- p* + - ■ • >0, 


i e., the sum of the even p’s must be positive. Similarly, still other linear sums 
of other it’s will result m cancellation of certain of the p’s. Except on special 
occasions there is probably no labor saved by utilizing conditions derived in 
this way. 

One obvious but useful necessary condition will be stated without proof. 
If, one forms polynomials from subsets of the coefficients of a given “stable” 
polynomial formed by arbitrary “cuts” which leave adjacent coefficients in 
unchanged order and introduce no gaps within each set, then the resulting poly¬ 
nomials will all be stable. 

Special sufficiency conditions also can be developed. Carmichael 4 presents 
certain inequalities between the absolute values of the largest root and the coeffi¬ 
cients of the original equation. For special problems these may be fruitfully 
applied. 


4. Example. In conclusion I apply the conditions derived here to a well- 
known numerical equation determined statistically by Tinbergen® in the analysis 
of economic fluctuations. It is a fourth order difference equation with constant 
coefficients, 

(12) Z t - 398Z t -i + .220 Z t - 2 - .013Z t _ 3 ~ .027 Z,^_ = 0 

1 R D, Carmichael, Amer Math Soc. Bull , Vol 24 (1918), pp. 286-296. 

6 J. Tinbergen, Business Cycles in the United States, 1919-0)3$, League of Nations, 1939, 
p. 140. 



364 


ROBERT D. GORDON 


With the associated indicial equation 

(13) f(x) = x* - .398s 3 + ,220s 2 - ,013s - .027 = 0. 

Its roots have been computed and are known to be leas than unity in absolute 
value. This may be verified by computing 


(14) 


to = 01782 > 0 
ti = 3.338 > 0 
rj = 6.398 > 0 
Tg = 4.878 > 0 
T 4 = 1,604 > 0 
= 14,204 > 0 
Tg = 43,177 > 0 


To compute the same results by cross-multiplication the work is arranged as 
follows: 


T 0 

T} 

T4 

.782 

5.398 

1.004 

Tl 

Ta 


3.338 

4.878 


T|TJ — ToTs 

T 3 T 4 — 0 


14.204 

7.824 



Ta(T]Ts — ToTj) — T1T3T4 

43.177 

It may be remarked that the presence of a negative coefficient anywhere in 
the table is an immediate indication of instability, and that there is no necessity 
to continue the computation until a negative sign appears in a leading coefficient. 
This fact often saves much labor. 


VALUES OF MILLS' RATIO OF AREA TO BOUNDING ORDINATE AND OF 
THE NORMAL PROBABILITY INTEGRAL FOR LARGE VALUES 
OF THE ARGUMENT 

By Robert D. Gordon 
Scrip? a Institution of Oceanography 

A pair of simple inequalities is proved which constitute upper and lower 
bounds for the ratio R„ l , valid for x > 0. The writer has failed to encounter 
these inequalities in the literature, hence it seems worthwhile to present them 
for whatever value they may have. 

1 J. P, Mills, "Table of ratio: area to bounding ordinate, for any portion of the normal 
curve," Biomelriha Vol. IB (1926) pp. 395-400. Also Pearson'e tables, Part II, Table III, 



mills’ ratio 


365 


The function R x is defined by 


(1) R x = c** 1 ' f e-** 1 - dt. 

The following relations between. R = R x and its derivatives are easily established 
by direct differentiations and substitutions: 


(2) 

dx 

(3) 

d‘R _ dR 
dx 2 ’ dx 

(4) 

g-(‘ + 


+ R = 
2 

a 2 +" 


0 


l’ + 1 (Ifi 1 

x dx x 1 

. d 2 R _ 2 

* da? a- 2 + 1 ' 


Also by ordinary rules 


(5) 


R x > 0, 


( 6 ) 


lim xR x = 1. 

x —i-oo 


l 5 . Suppose that at any point x L > 0, x x R > 1. Then by (2) dRjdx > 0, 
and R x would continue to increase with increasing x: still more, xR x would con¬ 
tinue to increase, hence we should have xR x > 1 for x ^ Xi, which contradicts 
(0). Therefore we find xR x £ 1 for x > 0, and 

(7) R x ^ l, 

if 


which establishes the required upper inequality. 

2° Suppose that at any point x 2 > 0, d?R/dx z < 0. Then by (4) cfR/dx = 
[d/dx)(d'H/<h") < 0 at this point Since these derivatives are continuous this 
implies that for all x > a>j, d‘R/dx 2 < [d 2 R/dx 1 } I=X2 < 0. Then we get the 
inequalities, for x > 


(Hi 
dr < 


r«i + ( ,_ ri r^i <rf| 

jlxjz L dx 1 Ja 

^ [Si + w ~ ^ 


R < R x , + (r 


~<?R 

_dx-Ji 


where [ ] s indicates evaluation at x = x*. Since [t2*72/dz*]s < 0, this implies 
that for sufficiently large x, R x < 0, which contradicts (5). It follows theD 
that (3) is positive, and substitution of (2) gives 


r x a 


x 

X 1 + 1 ■ 


( 8 ) 



300 


ROBERT D. GORDON 


We combine (7) and (8) in the double inequality: 

(0) -.4-, ^ ^ -■ if J! 3: 0. 

j.“ + L s 

This gives for th<- probability integral the corresponding inequality 

(10) ■ ! '* ^ -l.fr t,,l Ul £ - 1 ■ .1 <'~ xV \ 

•»■+ l s/2r \/2tt^ •»' V2tt 

Ft can easily be .shown (Im x > 0) that equalilies in (0) ami (10) are impossible. 





THE ANNALS 
of 

MATHEMATICAL 


STATISTICS 

(rOONDBD BT H. 0. CABTXB) 

The Official Journal of the Institute * 
of Mathematical Statistics 


cj 

& 

a : 


Contents 

Distribution of the Ratio of the Mean Square Successive Difference * A °" 
to the Variance. John von Neumann... 3Q7 

Some Examples of Asymptotically Most Powerful Tests. Abra¬ 
ham Wald. gg6 

°n the Distribution of the Quotient of Two Chance Variables. 

J« Jx. OURTIflS 4 .. . .. ( . 400 

Some Generalizations of the Logarithmic Mean and of Similar 
Means of Two Variates Which Become Indeterminate When 
the Two Variates are Equal. Edward L. Dodd .422 

A Study of R. A. Fisher’s z-Distribution and the Related F-Dis- 
tribution. Leo A. Aroian. 429 

The Doolittle Technique. Paul S. Dwyer ... 449 

Notes: 

A Problem in Estimation. Joseph F. Dalt, ...... 459 

Confidence Limits for an Unknown Distribution Function. A. Kol- 
moqoroep. 40J 

Corrections to a Paper on the Uniqueness Problem of Moments. M.. G. 
Kendall... 404 

Announcement Concerning Computation of Mathematical Tables.405 

Report of the Chicago Meeting of the Institute......408 

Abstracts of Papers.. 470 


Vol. XII, No, 4 — December, 1941 

















DISTRIBUTION OF THE RATIO OF THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE TO THE VARIANCE 


By John von Neumann 


Institute for Advanced Study 1 

1. Introduction. Let xi , ■ , x n be variables representing n successive ob¬ 

servations in a population which obeys a distribution law 


ce^-^dx, 

i.e. which is normal, with the mean £ and the standard deviation a. For the 
sample we define as usual the mean, 



the variance, 





S 2 = \ £ - *) 2 , 

n n-i 


and also the mean square successive difference 


5 2 = —U: £ Or M+ i - *,)*. 
n — 1 n-i 

The reasons for the study of the distribution of the mean square successive 
difference 5 2 , in itself as well as in its relationship to the variance s ! , have been 
set forth m a previous publication 2 , to which the reader is referred. The distribu¬ 
tion of 5 3 , and in particular its moments, were also studied there. The present 
paper is devoted to the investigation of the ratio 


V - 


5 1 

s 2 ' 


A comparison of the observed value of t\ with that distribution is particularly 
suited as a basis of the judgment whether the observations xi , ■ • , x n are 
independent or whether a trend exists, (Cf. sections 1 and 2, loc. cit. 2 ) 

The moments of 7? have already been determined by J. D Williams by a 


1 Also Scientific Advisory Committee of the Ballistic Research Laboratory, Aberdeen 
Proving Ground. 

1 John von Neumann, R H Kent, II R. Beilinson, B I, Hart, “The mean square suc¬ 
cessive difference,” Annals of Math Stal., Vol. 12 (1941), pp 153—162. 

367 



308 


JOHN VON NKtlMANN 


different method.* Williams' results have been cheeked by W. J, Dixon at the 
suggestion of S. S. Wilks, whose stimulating interest has been largely responsible 

for the undertaking of the sines of papers on 6 2 and ~. The present rather 

exhaustive discussion, however, brings out several other essential characteristics 
of this statistic, and provides the key to some very effective computational 
methods. It is further hoped that the reader will find that the mathematical 
methods used and the generalizations indicated have an interest of their own. 

Prom the latter point of view the final results of sections 5 and 7, concerning 
the distribution of values of quadratic and of Hormitian forms, may deserve 
special attention. 

2. Diagonalization of the quadratic forms and replacement by a spherical 
mean. Since 5 s and « 2 are unchanged when we replace each .r„ by x? — £, wo 
may assume £ = 0. Then the distribution law of x is 

cc*' ,ia% dx, and that of xi, ■ • • , .r„ is II cc~*’ ,!! ' s dx«, 
i.e. 


- V 

c"c ' 1 dxi dx„. 

n 

Any linear orthogonal transformation of the Xi, •' ■ , x n leaves 22 a 2 and 

dxi • - ■ dx n unchanged, hence the above distribution law will likewise be left 
unchanged. Thus, we may subject the two quadratic forms <5 a , h to any simul¬ 
taneous linear, orthogonal transformation. 

Consider one carrying Xi, ■ • • , x n into, say x\ , • • • , x' n , which brings the 

n 

quadratic form (n — 1)5 2 into the diagonal form, say 22 A^xf?. Such a trans- 
formation does not affect the characteristic values of the quadratic forms', and 

n 

these characteristic values are obviously Ai, • ■ ■ , A n in the case of 22 Ayx'*, 

fi-i 

Consequently Ai, • ■ • , A„ are the characteristic values of the original quadratic 
form (n — l)i 2 . We shall determine them as such in the next section. 

Clearly we always have (n — l)S i a 0, hence all A H 0. Some A^ may 


a J, D, Williams, "Moments of the ratio of the mean square successive difference to the 
moan square difference in samples from a normal universe," Annala of Math Slat., Vol, 12 
(1941), pp. 239-241 Cf. also L, C Young, "On randomness in orded sequences,’’ Annals 
of Math, Slat,, Vol. 12 (1941), pp. 293-300. 

* For the properties of matrices and quadratic forms cf. eg.: J H. M, Wedderburn, 
Lectures on Matrices, Amer, Math. Soc. Colloquium Publications, Vol. 17, New York, 1834 
In the present context cf. mainly Chapters II and VI 



DISTRIBUTION OR A RATIO 


369 


equal 0 say k (= 0, 1, • ■ ■ , n) of them, which we can arrange to be , 

’ ' ; A n . 

(n — l)<5 a = 0 is thus equivalent to x x = • ■ ■ = x' n -i = 0, i.e. to n — k inde¬ 
pendent conditions. On the other hand this amounts obviously to Xi = • • = 
x n , and these are n — 1 independent conditions So k = 1 and consequently 
Ai, ■ • • , A n -1 > 0, A n = 0. And our linear orthogonal transformation must 
carry the ^-vectors with Xi — x n into the x '-vectors with x[ — 

• ■ = % n - i = 0, Among the former, , • • , has the length 1; among 

V 71 V n 

the latter only 0, ■ ■ • , 0, ± 1 have Hence these correspond to each other. 
Now the scalar (inner) product of two vectors is an orthogonal invariant, that 

of a vector x\, • • • , x n with —, ■ - , is -\/nx, that of a vector x [, ■ ■ ■ , x' n 

V?i Vn 

with 0, ■ ■ ■ ,0, ±1 is ±x'„ , hence 

V»i = ±2Cn . 

n 

Put z„ = x + Up . Then clearly X u„ = 0. Hence 

p-i 

X xl = nx -f- X u l = x 'n + ns 1 . 

m-i 

n 

Owing to the orthogonality, the left-hand side is equal to X x >?> therefore 

^1“*1 

n-1 

2 V f 2 

TIS / j Xfi • 


Remembering that A n = 0, we also have 


Consequently 


(n — 1)5 2 = X A„z' 2 . 

^-i 

X 


n = ~s = 


n 1 v-, 1 n 

X x u 


The distribution law is, as we know, the same in x[, 
namely 


- 2 

c n e M " dx[ ■ dx' n . 


x n as in a?i 


Thus x [, ■ ■ ■, x'n are independent. t\ depends on x [, • ■ •, x' n -i only, hence we 
may disregard x' n altogether, and use the distribution law of the Xi, ■ • • , z„_i, 

n —1 „ r _ 

- s 2» a 

n-l i**I , t , > 

c e dx i • • • ax „-i. 



370 


JOHN VON" NEUMANN 


With respect to x{ , ■ • ■ , we may now state that the j( , ■ •. , dis¬ 
tribution of tj ran be obtained by determining first the distribution of 7 ? over 
every spherical surface 





r 


3 


and then averaging these distributions with the weights f(r) dr, where \p(r) dr 
is the probability of the spherical shell from r to r + dr with respect to our 
original x[ , • • • , *„_i distribution law. (It happens to bo e'e r * , *’*r n ~ t dr, but 
this is immaterial.) 

Since the x[, ■ • • , x n -y distribution law is obviously spherically symmetric 
in these variables, the first-mentioned distributions over the spherical surfaces 
are readily obtained by assigning each piece of the surfaces in question its own 
relative, n — 2 -dimensional area as weight. 

Since 17 is a homogeneous function of jq , •»• , x' n „\ of order zero, these spherical 
surface distributions of rj are the. same for all r. Consequently we can replace 
all these r by, say r = 1 , and the subsequent averaging over the r may be omitted 
altogether. 

Finally, since we restrict ourselves to r ~ 1 , i.e. to the spherical surface 



the denominator of rj may be omitted and we have 


t? = — — t - 22 A^x*. 

We sum up, writing again x t , • • • , x n -i for x[, ■ ■ ■ , x n ~ x , then the desired 
distribution of y is that of 


V 



^ . A ,, x 


M-i 


3 

M > 


where the point zi, • ■ • , x n -i is uniformly distributed over the spherical surface 

S < = 1 . 

Here Ai , • ■ * , A n -\ are all positive, and together with 0 they are the charac¬ 
teristic values of the quadratic form 

(n - l)i a = ]C (avt-i “ x?) 1 

n-i 

n—1 n—1 

= *1 + 2 22) 3* + — 2 22 X s+1 • 

S-* S-l 



DISTRIBUTION OF A RATIO 


371 


3. The characteristic values A„ ; first orientation concerning rj. We have 
shown that there exist (counting multiplicities) precisely n — 1 positive roots 
A of the characteristic equation 

A - l 1 
1 A - 2 1 

1 A- 2 1 
1 

Det • •. =0 

'' .. 1 
1 'A - 2 1 

1 A — 2 1 

1 A - 1 

(the empty places are filled with zeros), and that these roots are the 
Ai , ■ ' • , An—i . 

Such an A is characterized by the possibility of solving the equations 
(A — l)^i ■ Xi — 0, X\ -j- (A — 2)x 2 -f- £3 = 0, x 2 (A 2)xa -)- Xi = 0, 

., Xn-1 + (A - 2)x n -i + Xn = 0, x n -i + (A - l)x n = 0, 

in xi , • , x n not all equal to zero- Put 

Xo ~ X\ j = %n j 

and 


A = 2 — 2 cos a, 

then these equations become 

Xf,—i + Xp+i = 2 cos a-x,, for n = 1, 2, ■ • * , n — 1, n. 
The last equation is satisfied by 


x„ = 2 cos (n — %)a for fi = 0, 1, 2, • • ■ , n — 1, n, n + 1- 

Now *0 = xi is automatically fulfilled, while £„+i — x„ demands cos (n + |)a = 
cos (n — f)a, This is certainly the case when (ft + \)a = 2kir — (n — |)a 

hit 

( 1 k any integer), i.e. a = — . For no k = 1, ■ • ■ , n — 1 are Xi, ■ ■ •, x n all equal 

n 

to zero (indeed rci = 2 cos — > 0 ), hence these k give A’s of the desired kind. 
\ 2n / 

They are 


. „ _ k7r , . 2 kir 

A = 2 — 2 cos — = 4 sin — 


(fc = 1, • •, n - 1), 


and so they are all positive and different from each other. Their number is 
n — 1. Hence they are precisely A x , • • • , A„_ 1 . 




372 


JOHN VON NEUMANN 


So we have shown 


A,> ss 2 — 2 cos — = 4 sin a 
M n 2 n 


(m — 1, • ■, n — 1). 


We can now reformulate the final result of the preceding section. 
Let ua act 


- 2n n 
” ■»-1 (1 


«). 


Then 


e= £cos^ 


nr 2 
n *- 


wliere the point Xi, ■ * • , x„_i is uniformly distributed over the spherical surface 


n—1 


52 = i. 


jj-*i 


Replacement of x M by x„_,, carries f into — e. Therefore the distribution of 
* is symmetric around 0. Hence the mean of e is 0. The maximum of e’s 


distribution is clearly cos ~ , its minimum is cos ^ n ^ 

71 7h 


— cos-. We state 
n 


these facts, together with their equivalents for 17 . 

« (tj)’s distribution is symmetric around its mean, which is 0 ^ ^ ^ . The 

maximum of e (rj)’s distribution is cos - (—-- - 1 -\- cos ~ = cos z ~ ), 

n \n — 1L njn—l 2 n) 

. . it/ 2b r, 7T~| in . i 7T \ 

its minimum is —cos — -- 1 — cos - = -- sin — ) . 

n \n — 1 L nJ n — l 2n/ 

Thus it will be easier to obtain information concerning t/ by considering the 

distribution of e, since all odd moments of e are zero, etc. The investigation of 

e instead of y was first suggested by B. I. Hart, who also found, that the first 

four odd moments of e vanish. R. H, Kent and B. I, Hart also determined the 

minima and maxima of these distributions for certain small values of n. 


4. Direct computation of the moments. We shall investigate the distribution 
law of a quantity 

7 = 5 'Jhxl. 

where the point x% , • ■ • , x m is equidistributed over the spherical surface 

t * -1 

^Our above « obtains by putting m = n — 1 and B = cos — 



DISTRIBUTION OP A RATIO 


373 


We denote the mean of any function 

/(*i, • * ■ , Sm) 

over the above-mentioned spherical surface (the xi , • • • , x m being equidistrib- 
uted over it) by 

f(x 1, ■,!,) 

Our primary objective is to determine the moments of this distribution 

M p = V - (S B.xlJ, (p = 0, 1, 2, - - 

Let us write 2 m for the (m — 1-dimensional) area of the above-mentioned 
spherical surface (of the unit sphere in m-dimensional Euclidean space). 

Now we form the function 


m-r- f 


y>‘ 

6 4“ l- 


- 3 

M-l 


dx\ 


dx„ 


(This integral, as well as all others which we are going to derive from it, is ob¬ 
viously convergent, as long as z is sufficiently small More precisely this is true 
when 

| s 1-Max (| Si |, • ,|B m |)£l 


e z A Bl,xi e _ n-i dx i • • • dx, 


We shall use them only in the neighborhood of 2 = 0.) Now clearly 

{£*>}--{£■ -C(PA 

/ oo -oo / m \ v rn 

••• / (EB,xI) e~A xi dx 1 

00 J— 00 / 

/ oo r» oo / m \ p m 

••• f MptYjxl) e~A x ^dxi 

00 J—00 \)I^1 / 


Jo 

L 


= M 


g -r J j.2p+m-l 


dx m 

dx„ 


!“»0 


= I [“ c-“u p+im - 1 du 

Jo 


• Introduce the new integration variable u = r 2 . 



JOHN VON NEUMANN 


On the other hand 


f(z) = r * • ■ r e~A dxi • • ■ dXr, 

J— CO oo 

— II f c“* 

= II i(l - /d) *'2 f c' “u" 1 dw 

u-l Jo 


where 


Thus 


* II id - B h z) **2r(i) 

= rdr w‘, 


?(*) = II d - 


+ f) - r G)”{^ 


For v — 0 this becomes, since Afa — 1, $(0) — 1, 


5 *- r (“)" r (D’ 


Dividing the former equation by the latter gives, since 


M P ~ 


‘ f(s+0- r G ! +’ ,:i )^ w 


In order to make a practical use of the above formula, we compute 
In d30)~ s ) = -i £ In (1 - B p z) 

/z**l 

m oq H 

- -iEE 

M-L & 


-site*)'- 


4 Introduce the new integration variable it = (1 — Bpz)r 2 . 



DISTRIBUTION OF A RATIO 


375 


Write 


then 


Oil 


1 771 

-a 5 *- 


_ ^a 1 i+a 2 l 1 +a t c’+ 

= 1 4" /Si2 /3 j3 J + #)Z 3 + 


and so 


M v = 


1-2-■-p 


m (m . A /m . A 

2 V 5 + V"Aa + p ) 




Clearly 


ft = «i, 

Pi = «2 + , 

ft = aa + ai<*2 + i«i , 

ft = CH4 + + 0-10:3 + 2«l a 2 + uV*l • 

In our application (cf. above) 

Bjn+l—fi ' -5^ . 

This has the consequence that 

ai = «3 = a e = • • ■ = 0. 

Thus the z functions we compute contain only even powers of z and consequently 

Pi — ft = ft = ■ • ■ = 0, 

Mi = M s = M 6 = • ■ = 0, 

and 


ft = 0 : 2 , 

ft = «4 + 2 a 2 ! 

ft = «a + <*204 + s<*2 ) 

ft = as + 2“4 + a2“0 + . 



37G 


JOHN VON NEUMANN 


As mentioned before, we actually have m = n — I and B„ = cos — , Con- 

n 

sequently 




-isM'-iS'*. + c ‘"'" )l ‘ 

= 2 -«gsC) c ’' 

= 2«,SC)|i 


,t(2k-lV«/n 7 




= .L £ 

2 ‘+‘ l fA 


n- l 


£ c .wu-w>/.. _ 




The inner sum has obviously these values 


n—1 


umO 



if k — }l is divisible by n 
othenviae. 


Also 



Consequently 



where YU extends over those k = 0, * * •, l, for which k — \l is divisible by n. 

k 

Let us now determine the k occurring in the following sum (as above, k — 
is divisible by n) k = %l is clearly one of them. All others are of the form 

k 

k = \l ± hn, h — I, 2, ■ ■ • . The term contributed is the same for -j- and 
for —, since 


So we havo 


oil 


($Z + hn) Gl — hn) ’ 


0, 


2Z { 2 * _(il) + 2 (it - hn)_ 


for l odd, 
for l even. 


7 As pointed out above, we need to consider only the even l 



DISTRIBUTION OF A RATIO 


377 


The number of terms which the sum 2 contributes depends on the comparative 
sizes of I and n. The number is clearly 

0 for §Z < n> 

1 for n g < 2 n, 

2 for 2n g < 3 ti, 


Explicit formulae follow. 8 


«1 — «3 = «6 = = ag = • = o, 

n — 2 


a 2 = 


= 


as = 


as = 


8 ’ 

3n — 8 
64 ’ 

5 n - 16 
192 ’ 

35n - 128 


2048 ’ 

ft = ft = ft = ft = ft = ■ • • = 0, 
n — 2 


(0 for n = 1), 

(0 for n=l, 2), 

(o for n = l,2;~ 4 ,n= 3^, 


ft = 


Pi = 


Pi = 


8 ’ 

?t 2 + 2 ti - 12 
128 

71 8 + 12 ti 2 + 8n - 168 

3072 

n 4 + 28n 3 + 212n 2 - 64n - 3696 
98304 


(0 for n = 1), 

(0 for 7i = 1, 2), 

(oforn-l.a.-JL.n-g), 

^0 for 7i = 1, 2; 

35 _ 35 A 

> n = = 4 j- 


32768 


AiTi — Ms — Ms M7 = M 0 • ■ • = 0 , 

8 „ ti-2 


Mi = 


•ft = 


(71 - 1) (71 + 1) (n - 1) (n + 1) ’ 


2048’ 


(0 for n = 1), 


8 The author wishes to express his thanks to Miss B. I. Hart for her kind help in carrying 
out these computations. 




378 


JOHN VON NEUMANN 


Mi = 


Mi - 


384 


(ft - 1 )(n + 1)(» + 3)(n + 5) 


•ft 


3 (n 1 +2» - 12) 


46080 


(a - l)(a + l)(ft + 3)(n + 5) ’ 
(0 for n = 1, 2), 

•ft 


(ft — 1) (ft 4" 1) (ft + 3) (ft 4* 5) (ti ■+■ 7) (ft + 0) 

I5(ft j 4- 12ft 1 + 8n - 108) 

(ft - l)(ft + l)(ft~4-3)(ft + 5) (ft + 7) (ft + 9) ’ 

(0 for ft = 1, 2; tAtj n — 3). 


10321920 

(ft - l)(ft + 1) (ft + 3) (ft + 5)(ft + 7) (ft + 9)(ft +11)(ft + 13)' 

105 (ft* 4- 28ft s + 212ft 1 - 64n - 3696) 

(ft - l)(n + l)(ft+~3j‘(ft + 5) (ft + 7 ) (ft + 9) (n + llKft 4- 13) ’ 

(0 for ft = 1, 2; AAb. n = 3; n = 4). 


We conclude this section by obtaining asymptotic formulae for the distribu¬ 
tion of t when »—*«>. 

In this case our formulae show that all at (i even) behave asymptotically like 
constant multiples of n. It also appears from our formulae for the p < (l even), 
that 


2 

Pi = jYfr-. ai 1 4- a polynomial in a s , «*, - • • , «/_» of total order £1 — 1. 
W) 1 ' 

Consequently «}' is the dominant term in this expression, and so we have 
(av.l 

asymptotically 




From this 




-Aft 




jl nv 

MW 


Now the normal distribution 


c 1 e~ v,lul dy, 


1 1 tfi's/2ir^ ’ 


with the mean 0 and the standard deviation <ry has the moments 


mi 


00 


Cie 


,-V/M 


dy. 




DISTRIBUTION OF A RATIO 


379 


— „? +1 . q 1(*+U f e ~« 
Jo 


du 


This is clearly 0 for Z odd, while for Z even 1 

mi = <t£ +1 Ci-2 ! 

= 2* a+J >*l +1 < h r(l±±y 

For Z = 0 this becomes, since m 0 = 1, 

1 = 2*<r 1 c 1 im 

Dividing the former equation by the latter gives, since 


r/l+A 

V 2 ) _ 1 3 Z - 1 
r® 22 2 : 


m = 1-3 •• • (Z - l)ffi = 


Z! 


I _ Z1 (aiy 
(#)l\2/ ‘ 


2“(«! 

Comparing the formulae for Mj and for mi shows that ilf j ~ mi if ^ ^, 

(Ti = So we see: 

n 

For n —» oo the distribution of « becomes asymptotically normal, with the 

mean 0 and the standard deviation <n = (The same result could be ob- 

V n 

tained by applying the general theorems of Liapounoff and others.) 


5. The distribution law, general discussion. We return to the quantity y, 
defined at the beginning of the preceding section, of which our «is a special case. 
We wish to obtain direct information concerning the distribution law of this y. 
Since a permutation of the B^ is permissible, we arrange them such that 

B\ ^ Bi ^ ■ is B m . 

(In the special case y = «, the B h = cos — are given in this arrangement.) 

71 

The distribution of y covers obviously the interval 

Bi ^ y ^ B m . 

And if not B 1 = ■■■ = B m) i.e. if Bi > B m , which we assume to be the case, 
then we have obviously a continuous distribution law for y in this interval. 
We denote it by to(y) dy. 


> Introduce the new integration variable u 


2<ri ' 



380 


JOHN VON NEUMANN 


Assume for the moment that B m > 0. Then the quantity 



is bounded, and we can therefore form its mean value. This is the —~ moment 
of y (cf. the beginning of the preceding section) 

- P y- u u{y)dy, 

J n m 

With any two a > b > 0 (\ve shall have ^ » subsequently^ form the 

quantity 

t(a, b) - J f (jLxl) dxi "• dx m 10 
= /V m -2 m r m ~'dr - /“ ~ r 

J b J b T 

— In ^. 

Consider next 

s(a, b) — f ‘ " f (j^ x ") ^ Xl ' ' " 1 

= f J y n dxi • • • di'm 

= / “ • / AT-i* (ij *') /(/ft A. dii ■ • ■ dz m 

0‘S 

= M. I* /|/n Z?„ f(a, h). 

10 Concerning this transformation to polar coordinates and the quantity 2 m cf. the first 
part of the preceding section. 

11 Replace each variable by \/Bl x,, . 



DISTRIBUTION OF A RATIO 


381 


On the other hand, a comparison of their respective integration domains makes 
it clear that 


Thus 


i.e. 


t(B m a, BJb) ^ s(a, b ) g t{B ia , BJb). 

>" t? a VU *'- 2 ” ln 5 s FT ■ 


V 5 




i a .Si 

' n 5~' C A 

i a 
la 6 




/ 771 

yn 


5u 


i a i i 5i 

ln t + 1 °B: 
ln fe 


Now let r —>■ oo then 
b 


M—[ m — 



obtains, i.e 


7“ >m = P V~ im cc(y) dy = -.L= .-. 

yS B - 


We now drop the assumption B m > 0 . We consider instead a real number 
z with z < B m Replace each B„ byB f - z Then the one with n = m be¬ 
comes > 0. And y is obviously replaced by 7 — 2 . Consequently our above 
equation is now valid m the form 

r B i . ^ 


(7 - *)"*" = f 1 (y - z)~ im u(y) dy = 


/ in 

yUiB,- 


z) 


Let now z be a complex variable. The second term of the above equation is 
a (locally) analytical function of z, except in the (real) interval B x ^ z g B m 
The third term, too, is a (locally) analytical function of 2 , except at the (real) 
points B 1 , • • • , B m . Consequently both are one-valued analytical functions 
of z in the simply connected domain which obtains from the complex 2 plane by 
exclusion of the (real) half line 

z^B m . 

Hence the equation 

r 6 1 . 1 

I / \—*m r \ 7 A 


f (y - z) im w(y) dy = 




z) 



382 


JOHN VON NEUMANN 


which holds for all (real) z < B m , remains true for all complex z of the above 
domain. 12 

We observe next that u(y) is an analytical function of y in B l ^ y ^ B m , 
whenever y ^ -Bi, • • 1 , B m . This is easily established by using any multiple 
integral expression for co(j/) which, while hard to evaluate explicitly, puts this 
analyticity into evidence. 18 


11 [y — z)~ tm and the factors — z)"t of 




are those branches of these 




analytical functions which are (real and) > 0 when z is (real and) < B^ When m is even 
(as it will be, of. below) the domain of analyticity is somewhat more extended, but we need 
not discuss this 

11 The computation which follows gives the desired analyticity in a simple way, and also 
makes it clear why the analyticity fails at y — , • • • , B M 

Consider the y f 6 B\ , ■ ■ , B m in Bi & y ^ B m . The probability of y y is p(y) « 
rv 

I u(y) dy, and we may establish its analyticity instead of that of y'{y) — u(y). 

* n _ 


Obviously p(|/) is equally the probability of Y B „g V 5^ ** > if the xi, 

s-r 

m 

equidistributed over a spherical surface £) z$ ■» r 2 , with any given r > 0 

M-r 

Our hypotheses concerning y imply /}„ > y > #„ M for a suitable v ~ 1, •• • 
Consider now tha expression 


Xm are 


m — 1. 


f(.v) 


f f *" dx{ • • * dx m 


Transforming to polar coordinates, we obtain 


}{y) - f e~ r '-X m p(.y)r m ~ l dr 
Jq 

■* 2 m f c~ r ’r m_1 dr-p(y). 
Jo 


(£„ as before.) Hence it suffices to establish the analyticity of f(y). Now on the other 
hand 


Kv) 


f f e m^i*" dxi dx* 


2, («K-r)*Js 2.. 

r J4-T+1 


- - J J 

Vm 15 * - yl 


J ••• J e ( J U)[ . , . _ 


(We introduced the new variables w,, *= V| — y | s„.) And this expression is clearly 
analytical in y, since B v > y > B„ + i . 



DISTRIBUTION OF A RATIO 


383 


We shall need only the fact that u>{y) possesses \m continuous derivatives at 
these places, (m will be assumed to be even, cf. below.) Its behavior at 
y = B\, ■ • • , B m will follow from our subsequent results in all cases where we 
need it. 

In order to determine u(y) from (1), as we now propose to do, it is very con¬ 
venient to assume that m is even. We therefore make this assumption, and 
shall maintain it throughout most of what follows. 

Consider a y 0 7 * Bi, • ■ ■ , B m in Bi S y Q ^ B m . Then B v > y > B „+1 for 
a suitable v = 1, • ■ , m — 1. Now put 

z = yo + it (/ real and > 0), 

form (1), take the imaginary parts of both sides, and let t —> 0. 

Consider first the left-hand side of (1). Since w(y) possesses \m continuous 
derivatrv es at y = j/o, we have 

\m —1 

w(y) = Y Skiy - Vo) k + e(y)(y - y 0 ) im 
k -0 


with a bounded e(y). Clearly 

6k = k\{di* <o(l ' ) },- w 

Thus, sipce ui{y) is real, all 6 k are real and e(y) is also real 

Compute now the contribution of each one of the + 1 terms in the above 
expression for w(y) to the imaginary part of the left-hand side of (1). 

The last term, e{y) ■ (y — yo) im , gives 

3 [ (v- Vo- itT' m e(y)(y - yo)' m dy = 3 J ~ - rJ <v) dy- 

Js m J e m \y — y 0 — w 

The integrand is uniformly bounded, and so the reality conditions cause the 
entire expression to —» 0 for t — * 0 Hence the contribution of this term is zero 
for t —+ 0. 

The other ^ terms correspond to k — 0, 1, • ■ ■ , — — 1, the k term being 


3 f (y - 2/0 — it) lm -0k(y - yo) L -dy 


1 

= 0*3 

J 


(y - Vo) k 


dy 


b m (y - yo- * 0 * 

Y (fy ( it)\y - Vo - it) 

( 2 / - yo - it) hm ~ 


dy 


“ 0*3 / 

J l 

-»,t (‘)3{(«* f‘‘ (v - v> - % 



384 


JOHN VON NEUMANN 


m. 


The exponent k — h — in tlie integral in always g 

Z 


and it is = — 1 if and only if k 


m 


l, h = 0. 


this is not the case, i.e. where the exponent k — h — 
the expression $(•••) becomes 
^ 1 


3(*0 


k-h-™ + i 


{(y - ?/o - iQ 


Consider first a term where 
™ < —1- For such a term 

Jm | i 


For l —> 0 the last factors are bounded and real, and so the entire expression —»0. 
for /i = 0 because of the reality conditions, for h > 0 because of (il) h —> 0. 

VI 

Thus only the term h = — 1, h = 0 can contribute something else than zero 

z 

for i —> 0. 

Now this term is equal to 


(hi 0/ - ?/o - U)} v v ZhI, 


and for t —> 0 this converges to 





Thus the imaginary part of the entire left-hand side of (1) converges for 
i —»■ 0 to this expression. 

The right-hand side of (1) is easier to discuss. The imaginary part under 
consideration is now 


3 


1 


/|/n (B„ - 1/0 - it) 


= 9 n - ?/o - »o *. 


, 1-1 


Considering 12 (its y is our + it), this converges for ( —> 0 to 


3 II CB, - yo) * fl i(v o - B*) 1 



~ 2/o | 


14 This evaluation [In {y — yo — i'0 ~* tV is based on t > 0, and the fact that y moveB 

on the real axis from B„ to JSi . It has no connection with 15 , 

18 The square roots of the (real and) > 0 quantities 

B,i — Vo (m =■ 1, ■ • • , c), i/a - Bp {fi = v + 1, • ■ ■ | to), and JjJ 2? M — y 0 | 

ii-i 


are taken to be > 0. 



DISTRIBUTION OF A RATIO 


385 


If v (hence to — v) is even, 
this is equal to (— 


then, this is zero. 
1 



equation. 


If v (hence m — v) is odd, then 
Thus (1) becomes the following 



= 0 


= (-D 


i (tti— v—1) 



if v is even, 
if v is odd. 


Simplifying this, and writing y for ija , and also restating the definition of v gives 

if v is even, 


jJm—1 0 

-Mv) 


dy v 


( 2 ) 


= (-D 


J(m— v— i) 


(f- 1 )' 1 


ir / 

i/n 

V 


if v is odd, 


B„ - y\ 


B v > y> B v+ 1,11 = 1, • ■ • , m — 1. 


Observe finally, that if we put 


*(v) “ n (V - *,), 


n-i 


then this product has v factors < 0 (m = 1, ■ • • > u), while the others are > 0. 
So 


§f (y) ^ 0 for 


even 


v 


odd ' 


and in the latter case 


niB, -y\= -a(v). 

It is clear how we may now rewrite (2) 

We are now in a position to determine the behavior of u(y ) at y = 5i, • ■ • , B m 

7 YI 

too, since we know how its - — 1-th derivative behaves in the immediate 

£ 

vicinity of these places. (2) shows that it is singular there, and that the nature 
of the singularity depends on the number of the n, for which is equal to the 
y in question, i e. on the multiplicity of this root of our polynomial 91 (y). 

In our actual application (to 7 = «, cf. the beginning of this section) the 



386 


JOHN VON NEUMANN 


B„ are pairwise different, i.e all root multiplicities of 91 (y) are equal to one. A 
further special case, which has a certain interest of its own, is when the B„ are 
equal two by two, but otherwise different, i.c, all root multiplicities of 9 f(y) are 
equal to two. In the discussion which follows we shall therefore assume that 

one or the other of these two cases occuis 

1 

In the first case =l « (y) has on each side of ay — one of these two 


behaviors: It is identically zero, or it is singular, of the type 


VlX - y | 


it is at any rate integrable. Consequently a>(y) is continuous on each 

side of y — B », i.e. for both y = B„ ± 0. Successive integrations give now 
rfJi" T ty 

that all —- «(y), k = 0, 1, ■ • • , — 2, are continuous for both y = B„ ± 0 . 

ay 1 ' l 

In the second case we have Bi = B 2 > B 3 — Bj > • • • > B m _ 1 = B m . So 

the t/ with B v > y > B v+ i is necessarily even, and - - w(y) is identically zero 

(ty* m 1 

for all of ( 2 ). Consequently oj(i/) is again continuous on each side of 

y ~ , i.e. for both y — B„ =b 0. Successive integrations show again that all 

(I* 1 w 

■j—: w(y), A => 0, 1, ■ • • , r- — 2, are continuous for both y = B,, ± 0. 
ay K l 

vi 

We must therefore discuss only how much the w( y), k — 0, 1, • • • , - — 2 , 

ay* 2 

change from y — — 0 to y = B„ + 0 . 

Let us return to the procedure by which we derived ( 2 ) from (1). We put 
again 


z = Vo + it 


(;t real and > 0 ) 


and let i —► °°. But we consider now (1) itself (and not merely its imaginary 
part), and we choose a j / 0 = B,. 

Consider first the left-hand side of (1), always disregarding terms which stay 

r B 1 rB r +a 

bounded for l —> 0. Then we can replace the integral / of (1) by any / 
with any fixed a > 0 , and this is equal to 


J + j • 

J B,-a Jb .+0 


We choose this a > 0 so small that no B ,, ^ B v lies between B v — a and B v + a. 

TTl 

I.e. all 5 — w(y), k — 0, 1, — 2, are continuous from B„ — a to 

ay* 2 

B v — 0 and also from B, + 0 to B„ -)- o. 




DISTRIBUTION OF A RATIO 


387 


171 

This being the case, we can evaluate the above sum of two integrals by — — 1 
successive partial integrations. Thus we get 








V-B. 


-2 




\ y-flt,+a 


Jfc -0 


-j rB v -\-a 

+ -, -—x- 

(| - l) I J *~ 


A-ar*-™*£.<»>! 


B„ — *7) 


r-*v+° 

d im_1 


—r— w(v) dy. 

dy\m-x uyyj ay - 


In the first two lines the y — B v it a terms are bounded for t —> 0, therefore 
only the y — B v ± 0 terms need be kept. Then the first two lines give 


(i- 1 ) 


it) 




dy k 


o(y) 


V-B 9 +0 

1 

y ™0 


up to terms which stay bounded for t —* 0. Consider now the third line. We 


jlrrt—1 


know that the co(i/) in its integrand can be majorized by ^/j=^==^ ( for 

a suitable constant cj , cf. our discussion preceding the present one). Thus the 
integral in question is majorized by 

f \y — — it \~ 1 c i \y — B t |~ 4 dy, 

hence a fortiori by 

f j y - B. — it | _1 c 2 1 y - B v | -i dy 14 

^“00 

f” 1 

= Cj rM I U - i | 1 1 U I * du 

-c,r'[ 

= c ’i V? +1 


du 


\/ (« s + 1) * i u | 


dw 


i -1 . 


y — B v 

18 Introduce the new integration variable u = - 

17 Introduce the new integration variable v = V| w j 



388 


JOHN VON NEUMANN 


Since the last integration is obviously finite, the entire expression is 0(1 s ) for 
t —* 0. 

Consequently the left-hand side of (I) is equal to 


*=0 


(l- 1 ) 


(-*) 


~lm) 1 +* 


d k 

h' 


«(>/) 


v~*n p - o 


0(1 ! ), 


/ d v 

for t —> 0. (For i?„ = By or B m the u(y) at y — B v -f- 0 or B v — 0, respec¬ 


tively, must obviously be taken to be zero. J 


Consider now the right-hand side of (1). 

We first suppose the are pairwise different. The right-hand side in ques- 

1 . - j. 


tion is 


, i.e. 0 (r‘). 


J ft (*, -B v - it) 

y 

Secondly let us consider By ~ By > By = By > • ■ • > B m „ y = B„,. So we 
may assume u = 2 \ ^X = 1 , ■ • , . The right-hand side of ( 1 ) becomes now 

a rational function, gr-. (The sign is determined by 11 .) So in our ease 


it is 


II (Bn - *) 

*-i 

-, i.e. ii-i' - - (m- • ( it) + 0 ( 1 ). 

(Bn - Bn - it) II (Bn - B n ) U (Bn ~ Bn) 


ff 

Comparing these with our above expression gives therefore {for l —► 0) 


A-l 


fc-X-H 


(f - 2 - k) ! 

£ =4 - 

“ (f-l)l 


-Jm+1 \-k 


dy k 


*(v) 




y*ml) v —0 


— x -1 


(~\y 


im—X 


" 42L 


0(t ! ) in the first case, 

( — it)~' -}- 0 ( 1 “*) in the second case. 


n (Bik~ Bn)- II (Bn-Bn) 

k~l fc—X-l i 

In this formula the left-hand side is a polynomial in (—it)~ l . Hence the 0(1“*) 
terms on the right-hand side must vanish, and otherwise all powers of — it must 
have the same coefficient on both sides. Consequently 


-7-r-M !>•(») 

(m_ ) W 





DISTRIBUTION OR A. RATIO 


389 


must vanish, except in the second case for the one value of k with 
~2 + 1 + S: = —1, i.e. k = — 2 . So, with this one exception, we have 


d k , . 




dy' 


a{y) 


yaaBy - 0 • 


And in the exceptional case (second case, v = 2\) 


j!m -2 


\ y**B v +0 


) vf”0 


(- 1 ) 


]m—\ 




ri (B» - B 2X ) ft (B» - Bn) 

1-1 fc-\-fl 


Thus in the first case all derivatives — u(y), k — 0, 1, •••, — — 2, are con- 

GM/ 2 

tinuous even at = -Bx, ■ • - , B m . 

* 7Tb 

In the second case the same is true for fc = 0, 1, *••,— — 3, but the denva- 

Zt 

7)b 

tive with fc = — — 2 behaves differently for y = B 1} • ■ • , B m . Indeed, for 

y = Bn-i = = 1, • • • , ^ this derivative is continuous for both y — 

Bn =h 0, but it increases f rom — 0 to Bn + 0 by 


(-D 


4m-* 




IX (B2/C — Bn) XI (£» — Bzk) 

h-l *-*+! 


( jk 

At ?/ = /?! + 0 and fi m — 0 the — w(y) must be thought to continue with the 

dy 1 ' 

value zcro.'j 

These rules, together with ( 2 ), determine w(y) completely. 


6. First special case. We consider the first special case, where the B„ are 
pairwise different. We immediately specialize further, to y = e, i.e. m = n — 1, 

jS M = cos — (g = 1, • ■ • , n — 1). (Of. the beginning of the preceding section.) 
u 

Since in must be even, n must be odd. The rules of section 5 determine 

* ^ 

to (y) ; in particular all derivatives ^ w(y), k = 0, 1, • ■ •, —— — 2, are every¬ 
where continuous, beginning and ending with zero at y = B 1 and £»- 1 , 
respectively. 

In the even intervals 


y ^ £ 3 , 


Bi ^ y Bi, • ■ ■ , B n -3 ^ y ^ -Sn-a > 



390 


JOHN VON NEUMANN 


jKn~l)—l 

the derivative tt ^(v) ia zero, he. «(y) m a polynomial of degree 

dynn-ll-l 

}(n — 1) — 2. In the odd intervals 

Bi £ y ^ Bi, Bi ^ y S Bi, ■ • • , i?„_, S y £ /i„_i, 

we have 

dy ^-^ u{v) ± ' ' r V-SI(y) 

(the sign ± is alternating (—l) ,Cn-u “ 1 , ( —.. - ( where 

%(v) - fi (y - cos —^. 

Another expression for $((y) may be found by the following method. 


Clearly 


sin (tup) e in * - 


n —1 


am <p 


___ V' e »l«-l~2>Ov 

&* - er'* £o 


is a polynomial of cos ^ — K f< * + e~ ff ) of degree n — 1, with the highest co- 


For vs 


n 


, m — 1, • • , n — 1, sin (n<p) = 0, sin tp ^ 0, hence 


efficient 2 

—— , as a polynomial in cos </*, has the same roots as 9%). 9I(y) is a poly- 
am <p 

nomial of degree n — 1 with the highest coefficient 1. Consequently 

*<«»,,) = X, 

' ' 2 n_l sin <p 

This formula allows one to compute 9l(y) quickly, examples are 
n « 3: 2«2/) = y* - 
n == 5: 91(2/) = y* — y + A, 
n = 7. SI(y) = 2/' - V + fy 2 - A- 

The number of odd intervals, on which integrations must be carried out, 
is \(n — 1), but since those which are symmetric with respect to 0 require the 
same computations, only J(n — 1) or \(n -f 1) must be considered. So there are 
1,1,2, • • • such intervals for n = 3, 5, 7, • • • respectively. The integrals are 
first elementary (arcsin), then elliptic, then hyperelliptic. 

Numerical computations for n = 3 are immediate; for n = 5, 7 they have 
been carried out with considerable precision by B. I, Hart. 

i 

At y = B h , <*(v) baa a singularity of the type ^/j=^~====y (cf. the end 

section 5), while all v-ia(2/), k = 0, 1, • ■ • , §( n — 1) — 2, are continuous. 

ay 



DISTRIBUTION OF A RATIO 


391 


At y — B\ and B n - 1 , in particular, they are zero. Hence it follows by successive 

jk 

integrations that the order of vanishing of i— t u(y), k = 0, 1, • • • , |(n - 1) - 2 

tty 


at y — B\ and B n -1 is (|(n — 1) — 1) — k — ? = ~ — 2 — k. In particular 

it 

for k = 0 we find that at its maximum and at its minimum (^Bj and B„_i, 

i.e. ± cos ^ the order of vanishing of u(y) is ^ — 2 18 

Since u(y) has this property, and since it is obviously an even function of y, 
R. H. Kent has suggested approximating it by a series expansion of the form 

« / \i*-Z+h 

(3) «(y) = 2 a h ( cos 2 - - y z ) 

h-Q \ n / 


Computations by B. I. Hart, not yet published, have shown that even the use 
of the first four terms {h = 0, 1, 2, 3, the a* being determined by the condition 
of normalization and by the first three even moments of the actual distribution 
given m section 4) give excellent approximations. The use of the formula (3) 
suggests itself likewise for even values of n. 


T. Second special case. We consider now the second special case, where 
B x = jBj > B 3 ~ Bi > • • > = B m This has no immediate bearing on 

our original problem (cf. the preceding section), but we shall nevertheless discuss 
it for the two following reasons First, it is hoped that the reader will find an in¬ 
dependent interest in the simple and complete results which can be obtained in 
this case. Second, there are various modifications of our original problem, which 
lead to this case. For example let the ,x t , • , x n in our original problem, as 

described in section 1, be complex numbers instead of real ones, replacing all 
squares by absolute value squares. Then one verifies easily that all character¬ 
istic values Xi, • , X n -i are doubled, and so our first case goes over into our 
second case (This amounts to replacing our quadratic forms by Hermitian 
forms, cf 4 ) It is easy to imagine two-dimensional problems where this set-up 
is natural. 


We put C\ — J3 2 x-i = Bi\ for X = I, 


, ^ , so that Ci > Ci > 
z 


> c in 


arc the only restrictions imposed i 

Every y in Bi Z y Z B m , i.e. in Ci Z y Z , lies in an interval C\ Z y Z 

1 

C\+ 1 i.e t B 2X +X • That is the v of (2) is always even, and so “-’(?/) 

is zero in every one of these intervals. Therefore ox(y) is a polynomial of degree 

~ — 2 in every one of these intervals. We have already shown that u(y) is 
2 


lfl We omit the simple discussion of n = 3, which muat he excluded from this result 



392 


JOHN UtS \J t’M WJ* 


not the same polynomial m i'U^Ij hitorvid, Thus wy? is rrpre.-•‘tifotl by m — j 

2 


polynomials of degrc 


m 

2 

2 in the ” l 

si 


<’i > y .* e, 


I ittfervuN 


«'V 


Is 


We could Iry to obtain explicit expression** for these |Mihnunuuk bv a direct 
application of the results at the rlnvp of section . r » V ehfsiuHemntinn of the 
distribution etui, however, be obtained in a more ehatauf way by an indirect 
procedure. 

C’onsider an arbitrary funrtinn ftfj/}. We wish to express jt.s mean 


We select first an 


rKv) - f rt(?/)wf?/) <lff, 

J ‘\ n 

\y) then the distribution n 
1-fold primitive funefion of rP.vh «.e a function W(j/) 


\' t7i 

If we cun do tins for all rH?/J then the distribution is completely characteriml. 
iu 


with 




111 


Of course (M(y) in determined only up to tin additive jHilynumiul of degree "" — 2 

Z i 

in y. 

Now* the above expectation value becomes 

M “ / Pjm . ^Mll) ^ 

t"< 1 rCj, o U*i I 

* £ . , W(.v)«(.v) fly- 

<b n+,+o ttJ/ 1 " 1 1 

Since all ^ u(y), k — 0, 1, • - • , ” l — 2, are eoutimiouH from (\, i + 0 to 0, — 0 
ajr l 

for all \ — 1, 


m 


’ 2 


1, we can evaluate each integral of the: above sum by 


m 


— 1 successive partial integrations. Thus the following expression obtains: 


l v7‘v\ it* «*'" * * rwr x d* / A*~ e * * 

SIS ( 


d?/' B 


J 


+ (-I)*" ' j[ l «(y) f « 0 /)dy. 


d?/ ,m 


Considering the definition of W(y) as an — 1-fold primitive function, the 


Ik’ 


Tty 

k , &(y), k 1 — 0, 1, • • * , — 2, are everywhere continuous. This corresponds 


dy 



DISTRIBUTION OF A RATIO 


393 


to ft' - 2 ^ 2 , k — 0 , 1 , — 2 Hence the first line can be rewritten 


aa 


V ‘v r fd* . /r Cx+tt 


Xical A'«=a 0 




/ d* 

^For Cx = (7i or C|« the w{y) at y = Ci 4 - 0 or C{ m — 0, respectively, must 

obviously be taken to be zero.^ Owing to the results of section 5 all terms 

with k = 0, 1, •, 5 - 3 vanish, and the term with k = — — 2 gives 

Z 2 


-£ (—l) ,m_2 ©(<7x)( — 1 ) 1,n_A - l) ! x -zr 


1 


n (c* - c x ) ft ccx - c*) 

*-i i=\+i 

1 


= £<-Wf-lW- 

v ' n (c fc - Co n cc'x - £?*) 


- ®(Cx). 




A-X+l 


d* 


m —1 


The second line vanishes, since o>(y) is zero everywhere, as observed above. 


Finally 




5<v) - £(-«* 

W 


-Ci-O'w 




®(Cx). 


IT (Cx - Cx) n (Cx - C») 


For 


we have 


im 


to = n («- c») 


t-i 


d? 


[m 


93(z )} = H(mw (Cx — Cx) 

J r-C x fc-1 

- <-D* _i n (Cx - C\) n (Cx - c fc ). 

1 fcn»X4-l 

Therefore the above formula can also be written 

■*= ®(Cx) 


ft ( 2 /) 


-(s-0 ! S 


{a"®}, 


I-Cx 


Observe that the right-hand side of the above formula (which can also be 
easily expressed in terms of determinants) is a well-known approximate ex- 



394 


JOHN VON NEUMANN 




pression for -—- ©(y), as a (repeated) difference quotient of the values 

dyl' n -L 


A = 1 , ' 
mean of 


m 

’ 2 ‘ 


It is therefore very satisfactory that this expression gives the 


to = 


jJm—1 


dy^ 


©(!/)• 


Appendix. We return to the normal distribution of ii, • • ■ , x„ as described 
in section 1, and to the quantitie s s i l 5 1 , y given there. We denote means with 
respect to that distribution by ( ■ • • ). 

It was observed by B, I. Hart and mentioned by J. D. Williams 3 by com- 

paring the known expressions for their moments, that every moment of rj — — 

s 2 

is the quotient of the corresponding moments of 5 2 and of a 2 . That is 

&iP (V = 0 , 1 , 2 , .-). 




S 2 ” 


This indicates some kind of independence relation involving 5* and s 2 . The 
considerations which follow arc intended to clarify this situation, 

The above relation may be written 


or, more generally, 


ip p „2/i p 

« if = 


a 1 r, p = «V- 


We shall prove this by showing that s and r\ are statistically independent, 

We can, as in section 2, make the mean f = 0, i.e. obtain the aq, • ■ • , x* 
distribution law 

c n e~$i dx i • • • dx„. 

And then, again as in section 2, perform a linear orthogonal transformation, 
carrying aq , • ■ • , x n into, say aq, • • • , x'„ which leaves the distribution law in 
its original form 

c n d%[ dx' n , 


i 1 V - ' n 


v = 


n 

n % 
n " 1 g x' 2 

p-i 


and makes 



DISTRIBUTION OF A RATIO 


395 


Since x' n does not occur in s\ tj we must use only the , x n -i distribu¬ 
tion law 


n-l -V i:S/2<r* J I j I 
c e ^1 " dx i • • • flK n _i. 


Now we introduce polar coordinates with respect to x [, 
consist of a radius r with 


i 

I -Tn—1 • 


These 


ft—I 

r '2 

# -i 

and n — 2 angular variables pi, • • • , p n _ 2 , which can be chosen in various 
ways, and which we need not describe more closely At any rate 

dx i ■ • • dzl-i = r" 2 drw(<pi , ■ ■ • , Pn-2) dpi ■ • • dp„_2 

where we need not determine the weight function w(<pi , • • • , p n _ 2 ). Conse¬ 
quently the distribution law is 

c n-l g -r!/! t y—! ^(pi , • • ■ , p„_ 2 ) dpi ■ • • dp n - 2 . 

Thus the coordinate r and the coordinates pi, ■ ■ • , p„_ 2 are independent of each 
other. 

Next 


1 
- r 
n 


and 7j is a homogeneous function of x[, ■ ■ • , 3*_i of degree zero, i.c. it is inde¬ 
pendent of r. So $ is a function of r alone, and tj is a function of pi, • • • , p „_2 
alone. Consequently s and tj likewise are independent. 


Added in proof: 

After this manuscript was completed, Dr. T. Koopmans informed the author 
of several results of his own, which he obtained in connection with other statistical 
investigations. They have many points of contact with this investigation, and 
will appear in the near future in the Annals of Mathematical Statistics. The 
author wishes to express his thanks to Dr. T. Koopmans for his communications. 



SOME EXAMPLES OF ASYMPTOTICALLY MOST POWERFUL TESTS 


By Abraham Wat,]) 1 
Columbia University 

1. Introduction. In a previous paper 2 the author gave the definition of an 
asymptotically most powerful test and has shown that the commonly used tests, 
based on the maximum likelihood estimate, are asymptotically most powerful. 

In this paper some further examples of asymptotically most powerful tests 
will be given. Let us first restate the definition of an asymptotically most 
powerful test. Let f(x, 9) he the probability density of a variate x involving an 
unknown parameter 0. For testing the hypothesis 9 — 9a by means of n inde¬ 
pendent observations 3i , ■ > ■ , x n on % wc have to choose a region of rejection 
W n in the a-dimensional sample space. Denote by P(W n j d ) the. probability 
that the sample point K = (.ri, ■ ■ , x n ) will fall in W» under the assumption 
that 9 is the true value of the parameter. For any region U n of the twlimcn- 
sional sample space denote by g{U n ) the greatest lower bound of P(U n I 9). 
For any pair of regions U n and T„ denote by L{U n , T„) the least upper bound of 

Wn I 0) ~ P(T n | 9). 

In all that follows we shall denote a region of the n-dime.nsional sample space 
by a capital letter with the, subscript n. 

Definition 1. A sequence j W„} (n — 1,2, • • • , ad inf.) of regions is said to be 
an asymptotically most powerful test of the hypothesis 9 — 9 a on the level of 
significance a if P(JF„ 10 n ) ~ a and if for any sequence |Z„) of regions for 
which P(Z n | 0 a ) = « the inequality 

Jim sup L(Z„, W n ) < 0 

n—ta 

holds. 

Definition 2: A sequence {W„] (?i = 1, 2, • • • , ad inf.) of regions is said to 
be an asymptotically most powerful unbiased test of the hypothesis 6 = 0 B on 
tlu 1 level of significance a if P{W n 10 O ) = lim «= a, and if for any sequence 

[Z n ] of regions for which P(Z n \ On) — lim g(Z,i) -■ a, the inequality 

lim sup L(Z n> W n ) < 0 

fl^oo 

holds. 


1 Research under a grant-in-aid of the Carnegie Corporation of New York. 

* “Asymptotically most powerful tests of statistical hypotheses,” Annals of Math, Slat. 
Vol. 12 (1941). 


396 



MOST POWERFUL TESTS 


397 


Consider the expression 

(!) Vn(9) = J2 log f(x a , d) 

■\r n 1 

Let W' n be the region defined by the inequality yjd n ) > c' n , W" defined by the 
inequality y n {8 0 ) < e" , and W n defined by the inequality | y n (8 0 ) [ > c„ , wheie 
the constants c' n , and c„ are chosen such that 

P(W' n | <9 0 ) = P{W" | ft,) = P(W. | ft,) = a. 

It will be shown in this paper that under certain restrictions on the probability 
density /( x, 8) the sequence { W ' n } is an asymptotically most powerful test of the 
hypothesis 6 = 9 0 if 8 takes only values > 6 0 . Similarly { W "} is an asymptot¬ 
ically most powerful test if 6 takes only values < 0 o . Finally { W n } is an 
asymptotically most powerful unbiased test if 8 can take any real value. 

Another example of an asymptotically most powerful unbiased test of the 
hypothesis 8 = O 0 , as it -will be shown, is the critical region of type A in the 
Neyman-Pearson theory of testing hypotheses. This fact gives a strong justifi¬ 
cation for the use of the critical region of type A. 


2. Assumptions on the density function. Let w be a subset of the real axis. 
Denote by 8* a real variable which takes only values in u and let 0 be a variable 
which can take any real value. For any function \p(x) we denote by Eg^ix) the 
expected value of \p(x) under the assumption that 8 is the true value of the 
parameter, i e. 

Eg\p(x) = [ i(x)f(x, 8) dx. 

J—oo 


For any x, for any positive 5 and for any real value ft denote by <pi(x, ft, <$) the 

greatest lower bound, and by y>j(x, ft , 6) the least upper bound of — log fix, 8) 

od 3 


in the interval 6i — 5 < 8 < ft + 5 In all that follows the symbol ft* , for 
any integer i, will denote a value of 8 *, i.e., ft is a point of co. 

We say that a value 9 lies in the e-neighborhood of a? if there exists a value 8* 
such that | 6 — 8* | < e. 

Throughout the paper the following assumptions on/(;c, 8) will be made. 

Assumption 1 ■ For any pair of sequences j ft,} and { 8 *} (n = 1, 2, , ad mi',) 

for which 


hm E s „ —Aogf(x,8t) = 0 

n«=Joo Op 


lim ( 9 n — 8 *) = 0. 


also 



398 


ABRAHAM WALD 


[ d l 2 

—- log f(x, 6i ) is a bounded 

off 

d 

function of 8 and 0 ,, E$ — log f{x, fli) is a continuous function of 8 and 0i and 

off 

r d J 

Ei. -- log /(*, &i) — d(0 1 ) has a positive lower bound, where 0, can take any value 

Ide j 

in the t-neighborhood of w. 

Assumption 2: There exists a positive k 0 such, that Ee t vi (, x , 9 ,, S ) and 
E$,ifu{x j 0 !, 5) arc uniformly continuous functions in the domain D defined as 
follows: the variables 8\ and 0 j may take any value in the k r neighborhood of u and 8 
may lake any value for which | 5 | < k D . Furthermore it is assumed that 

Eififdx, , <$)]\ (i = 1 , 2) 

are bounded functions of Oi , 0 2 and 5 in D. 

Assumption 3 ■ There exists a positive k a such that 

for all 0 in the k<s-ncighborhood of u. 

Assumption 3 means simply that we may differentiate with respect to 8 under 
the integral sign. In fact. 


.+» 

[ f{x,8)dx=sl, 

J—00 


identically in 6. Hence 


/ +» /.+« 

f(x, 8 ) dx - —j f{x, 8) dx = 0 . 


Differentiating under the integral sign we obtain the relations in Assumption 3. 
Assumption 4; There exists a positive ka and a positive y such that 

Ee \je 

is a bounded function of 8 in the krneighborhood of a>. 


3. Some propositions. Proposition 1: To any positive 0 there exists a posi¬ 
tive 7 such that 

lira **{ -~t=- I | > t I 4 = 1 

»-» Ivn J 

uniformly in 8* and for all 6 for which j 8 — 8* | > 0. 

a 

Proof: From Assumption 1 it follows that Ee — log/(x, 8*) has a positive 

d6 



MOST POWERFUL TESTS 


399 


lower bound in the domain | 8 — 6* > 0. Since according to Assumption 1 

<9 " 1 ® 

^ log f{x, 0*) is a bounded function of 9 and 0 * Proposition 1 easily follows. 
Proposition 2: There exists a positive t such that 
lim P[y„(9) < t \ 9] = N(t \ 9) 

n = 

uniformly m t and for all 6 in the e-neighborhood of w where 


= ~ Ee M* log ^ x >^ = E ° j 6 Io s /(*, & ) 


(3) N(i | 9) = 1 - (‘ e' u2/m dv. 

V 2 ird(6) •*-*> 

Proposition 2 follows easily from Assumptions 3 and 4 and the general limit 
theorems. 

Proposition 3 • There exists a positive t such that for any bounded sequence {/i„} 

lim (p T y n {9) < 1 1 9 + _ [‘ e w-K^ dN{v 1 0)1 = 0 

»-« l L v n J ■'-=> J 

uniformly in t and for all 9 m the «-neighborhood of u. 

Proof: We have 

W S -( e + “ VM + 

where 6„ lies in the interval 0, 6 + . From Assumption 2 and the above 

L V « J 

equation we easily obtain 

lim {p\yn(e + <<|0 + - 7 ^~| 

(5) ^ *- ' V nr VnJ 

- p [y,(») - p n d(e) <t\e+ = o 

uniformly in t and for all 0 in the e-neighborhood of at. From Proposition 2 
and (5) we get 

lim ( [* dN(v | 0 ) - P U„(0) < t + und{9) 6 + -~\\ = 0 
n-oo \j -00 L - V nJj 


lim < / dN(y | 0) — P y n {6) 

go L 


< 1 1 0 + 



400 


A Ml AHAM WARD 


uniformly in l and for all d m tin 1 e-neighbourhood of w. This proves Propo¬ 
sition 3 

Proposition 4: There i rials a positive e such that for any positive y and for 
any sequence. Inn! for which Inn j y„ i — oc 

n **»» 

lim p[\ y„(0*)! > y j 0 + 4- - ti :) - 1 

n— so { V u) 


uniformly in 0*, 

Proof: If there: exists a positive (i such that 


| fin ' 

! \/ n 1 


> j3 for almost all n, 


Proposition 4 follows from Proposition 1. Hence we have to consider only the 


case hm -y=: = 0, Since 

n—» -y /n 


2/n (^* + — 0, 


we get from (4) 

£ ‘ jlOg/(T„, On) 

(7) 7iVn <„„/v/.,)[yn(0*)] + AVh (,,/Vn) -* - ■ - n • - * - = o. 




Since lim ^ - = 0, we have on account of Assumption 2 


lim (^ n /%/«) 


Z ^ 2 l 0 K/(^a, O 


n 


= ft- 502 log /(a*, 0*) 




30 


log /(x, 3*) 


= -d( 0 *) 


uniformly in 0*. According to Assumption 1 d(B*) has a positive lower bound; 
hence on account of lim | ! = 03 we obtain from (7) 

(8) lim l | = «> 


uniformly in 0*. The variance of y„{0*) in equal to the variance of ^ log/(x, 0*), 

Q 

On account of Assumption 1 the variance of ^ log/(x, 0*) (under the assumption 

fjL n 

that 0* + ‘ 3 i*!' 0 hruc value of the parameter) is a bounded function. Hence 

Proposition 4 is proved on account of (8), 

Proposition 5. Let nPn{0*)} he a sequence of regions of size a, i.e. 
P\Wn(8 M ') | 0*] = a, and let V„(6*, y) he the region defined by the inequality 



MOST POWERFUL TESTS 


401 


yj8*) < V- Let U „(#*, y) be the intersection of V n (6*, y) and WJd*) and denote 
P[U n (Q*, y) | 5*] by F„(y | 0 *). Denote furthermore P W„(d*) I 0 * + - 7 =-"] by 

* L V nj 

G(0*, y, n) If j 6 n } and {/i„) are two sequences such that lim d(9*) = d; 

n»oo 

lim F n (y \ 6*) = F{y) and lim y n = y then 

n =00 

lim G{0t,y n ,n) = p" S>-*'*dF<x). 

n “=-°e J— 00 

Proof; Let lim ju„ = p. and consider the Taylor expansion 

? log 1 ( x -■ # * + " ? los ■ e "> + Vk ? m 106 f{x * ■ 

+ 2 ^ 86 ^ ] ° gf( - Xa,6n) 
From this we easily get on account 


(9) 


where d' n lies in the interval 


8 *, 8 * ■+ 


\/»J' 


of Assumption 2 and the fact that {y n } is bounded 


( 10 ) 


io g n 


n /(*«, 0 * + ) 

I -^-F7 —= /*»y.(0*) - d(0*) + *( 0 *, n) 


«-i /(*«, ®*) 


where for arbitrary positive 77 




(11) lim P<|| e(0*, n) | < v \8* + = 1 


uniformly in 0 *. Denote by P„( 0 *) the region defined by 
( 12 ) | «(«+, n) ! < v > 0 . 

On account of (11) we have 


(13) 


lim P 


Rn(e*){0* + 


V' 


— - 1 
f~ 

n_ 


uniformly in 0 *. Denote the intersection of R n {Q*) and W n (f*) by Q ft ( 6 *), and 
the intersection of P„(0*) and 17,(0* y ) by T n (0*, y). Furthermore denote 
P[T n (0*, y ) | 0*1 by F„(y | 0 *). Then we have 


(14) 


e~ n j‘ dK(y \6*)<P ^T n (Q*, t) 1 8* + 

< # f ‘ dF n (y\6*) 

J—00 



402 


ABRAHAM WALD 


for all values of t and 0*. Furthermore we obviously have 


(15) 


lim <(7(0* nn, n) - P 


Q n (B*) | 0* + 


Vi 


a-o 

n j j 


uniformly in 8*, and 

(16) lim [PM | 0*) - P n (L | 0*)] = 0 

n “oq 

uniformly in 0* and t. Since y may be chosen arbitrarily small, it follows from 
(14) and (15) that to any e > 0, ij may be chosen such that 

!%+«> 


(17) 


lim sup 


/ ■rcc 

CO 


- 2 


for any sequence (0*) 


To each t let L, be a positive number such that L, depends only on e and 


(18) 


f L ‘ edN(t 1 9 *) + r dN(t 1 0 *) < ~ 


for all n and for all values of 0*. Since (1(0*) has a positive lower and a finite 
upper bound, it is easy to verify that such a L, exists. From (18) and Proposi¬ 
tion 3 it follows 

~i 


lim sup \ P 


(19) 


y n (0*) < ~L, 10* + 

■vnJ 


+ P Vn(0*) > L, | + --£=• 

L V n J 


for any arbitrary sequence {0*|. Since the difference U„(0*, k) — U„(9*, U) is 
a subset of the difference 7„(0*, U) — F„(0*, U ) and since T„(6*, k) — 7’ n (0*, ti ) 
is a subset of (/„(0* k) — U n (0*, h) for k > h , wo get from (18) and (19) 


lim sup < P 


( 20 ) 


and 


( 21 ) 


T U n (el, -L t ) | 6$ + 

V n. 


] + p [f„(*U) | 0* + 


lira sup < P 

n-too i, L 




Tn(fil , -L.) I 0: + - 7=1 
V nJ 


+ P 


\ qM I C + - 7 - 
L VrJ 


- P T n (d* n , L,) | 9* + \ 

for any sequence {0*). On account of (14) we get from (21) 

(22) eT’ lim sup { r L ‘ d F n (t | 0*) + dP„(i | 0^)) < i. 

rt—►<* [J—ao j L t J 2 



MOST POWERFUL TESTS 


403 


From (17) and (22) we obtain 


(23) 


lim sup 


<?(£,*.,») - r'e^-WWdFM Itf) 



for any sequence { 0 „}. Consider now the sequence (0*) which satisfies the 
conditions of Proposition 5. Since F n (t j 0 *) converges to F(t) uniformly in t, 
on account of (16) also F n (t \ 0 *) converges to F{t) uniformly in t Hence we 
obtain from (23) 


(24) 


lim sup 

n —*oo 



dF ^ 



Since t and 77 may be chosen arbitrarily small. Proposition 5 follows from (24). 


4. Some theorems and corollaries. Theorem 1. Denote by S n (8 *) the region 
defined by the inequality y n (0*) > A n (8*) where A n (9*) is chosen such that 
P[S n (8 *) | 0*] = a. For any region lF n (0*) denote by L„[TF n (0*)] the least upper 
bound of P[TF„(0*) | 0] — P[(S»(0 + ) | 0] with respect to 8* and 0, where 0 is restricted 
to values > 8* Then for any sequence |TF„(0*)} for which P[WJ6*) | 0*] = a, 

lim sup L n [TFn(0*)] < 0. 

n—♦« 

Proof. Assume that Theorem 1 is not true Then there exists a sequence 
of integers {n'}, a sequence (0*,) and a sequence {0 n '} (0„« > 0*-) such that 

(25) lim iP[W n .(8l) \ 8«.] - P[S n .(6* n ,) | 8 n .]} = S > 0 

n “oo 

On account of Proposition 2 and Assumption 2 the sequence {A n .(0*,)) is 
bounded. Then it follows easily from (25) and Proposition 4 (taking in account 

3 

that Et log f(x, 8*) > 0 for 0 > 0* 
da 

(26) (9 n . - dtWrt = Mt,' > 0 

must be bounded. Denote by [n") a subsequence of {»'} such that 

(27) lim diet-.') = d 

(28) lim Mn'' = u, and 

(29) lim F n ..(t | 0 *..) = F(t) > 
uniformly in t where 

F„it | 0*) = P[U n i8*, t) | 0*] 

and U„id*, i) is the intersection of IF„(0*) and the region y n {8*) < i The exist¬ 
ence of a subsequence } n”) such that (29) holds follows from the fact that 

(30) F n {U | 8*) - F n ih \ 8*) < $„(k | 8*) - $„(ii | 8*) for U> k, 



404 


ABRAHAM WALD 


and 


(31) 


i r * 

lim <!»„(/1 0 * n ") = - -, -- / 

n=<oo "\/27TU v ^ _ ‘ 0 ° 


luVii 


dy = A’(0, 


where $„(( | 0*) denotes the probability P{?/„(0*) < (| 0*]. 
easily be shoAvn that 


(32) 



= Of. 


Furthermore it can 


On account of Proposition 5 wo get fiom (25), (27), (28), (29), (30) and (31) 
(33) f e» l ~ Wd lint) - [" dN(t) = S, 

J- a) J A 


where A denotes a value such that 

I diV(t) = a. 

It has been shown in a previous paper 3 that (33) loads to a contradiction. Hence 
Theorem 1 is proved. 

Theorem 2; Denote by fi n (0*) the region defined by the inequality y»(0*) < 
d n (fl*) where A n (0*) is chosen such that J’[S„[0 M ) | 0*) = «. For any region 
1F„(0*) denote, by /y (l [W r »(fl’ 1 ')] the least upper hound of 

P[W n (0*) | 6} - P[F n (0*) | 0 ] 

with respect to 6 * and 0, where 0 is restricted to values < 0*. Then for any sequence 
\W n (0*)} for which P[Wn{0*) \ 0*) ~ ct, 

lim sup L,[W n {0*)] < 0. 


The proof is omitted, since it is analogous to that of Theorem 1. 

Theorem 3: Let {TF„(0*) j be for each 8 * a sequence of regions for which 
P[W n (0*) | 9*} = a and Inn 0[1F„(0*)] = a uniformly in 0*. Denote by L n [W„(0*)] 

71*= DO 

the least upper bound of 

W„(n 10] - P[\ y n (d*) | > ;l n (0*) | o] 
with respect to 0 and 0*, where yl..(O' 1 *) ns chosen such that 

P\\ Vn(0*) | > A„(0*) | 0*] = a. 


Then 


lim sup L n [W T) (0’ ,< )] < 0. 

n "♦’oo 


a See p. 12 of the paper cited in a , 



MOST POWERFUL TESTS 


405 


Proof- Denote P[y n {6 *) < 11 0*] by $ n (f | 9*) and denote by F„{t | 0*) the 
probability (under the hypothesis 6 = 6*) of the intersection of W n {6*) with 
the region y n (6’*) < t Assume that Theorem 3 is not true Then there exists 
a subsequence {»"}, a sequence (0*"} and a sequence { d n ,r J such that 

lim d(0*>>) = d; hm (9 n ^ — 9$>')\/n" = lim = u\ 
lim F n "(i [ d*n) = F(t) 

uniformly in t, and 

(34> / + ” e^UFit) - f A c^UNQ) - f e^ d dN(t) = d 

J— DO J— go J A 


where A is a positive number such that 

f ^ dN(l) = and N(t) = [‘ e~ w,d do. 

J-« 2 \/2rd 

This can be proved in the same way as (33) has been proved The author has 
shown in a previous paper 4, that (34) leads to a contradiction Hence Theorem 
3 is proved 

Theorem 4 Denote by A n (0*) the region of type' A of size a for testing the hy¬ 
pothesis 6 — 0 + . Denote by B n (0*) the region \ y„(6*) \ > C n (6*) where C n (6*) 
is determined such that 


P[| yn{e*) I > c n (e*) 1 0 *] = «. 


Then, under the assumption that E e ~ log f{x, 9*) 

\__ VU * 


is bounded, 


lim {P[An(e*) | 5] - P[B n (9*) | 0]} =0 


uniformly in 6 and 6* 

Proof- The region A„(0*) is given by the inequality 6 


(35) 


L. <30 


iog/(T aj e*) 


+ Zilog fix*,e*) > kUe*) 

a CFu 


Z ™ log f(x a , 0*) 

a 017 


+ fc"(n 


where fc'(0*) and fc"(0*) are chosen such that A„(0*) should be unbiased and of 
size a. The inequality (35) can be written also in the form 

(36) lyjr)? + - Z ilog/(* a , 0*) > l' n {0*)y n m + CV). 

71 a au 


4 See p. 14 of the paper cited in s 

6 Ncyman, J and Pearson, E S., “Contributions to the theory of testing statistical 
hypotheses,” Stat. lies. Mem , Vol, 1 
* See the paper cited in 6 



406 


ABRAHAM WALD 


Let )/i„) be a bounded sequence. From Assumption 2 it follows that for any 
positive e 


(37) 




\ogf(x a] e *) + d(js*) j < < j o* + 



1 


uniformly in 6*. Since (37) holds for arbitrarily small e, we get easily on ac¬ 
count of Proposition 3 


(38) 


lim 


(pk)l^ + -V 1 - /Ta'(o*) 

L L VnJ L 


e* + . 

v 


uniformly in 6*, where A 1(0*) is defined by 


(39) [y n (8*)f > l' n (0*)y n (0*) + %(8*) + d(0*). 


Since A n (0*) is unbiased and of size a, we have oil account of (38) and (30) 

(40) lim l' n (9*) = 0 and 

(41) lim + d(Q*) = \(0*) > 0 


viniformly in 0*, where X(0 + ) is Riven by the condition 


(42) 


1 r f v'm 
V2 ird{d+) 


e - 


(1L = a. 


Inequality (39) is obviously equivalent to the simultaneous inequalities: 

V»(8*) < c'„(0*) and y n {8*) > c"(0*) 

where c' n (8*) and c"(8*) are the roots of the equation in y n (8*) 

[yn(9*)f = lUe*)y n {e*) + I’Ue*) + d(e*). 

Since 

lim c'„(fi*) = — and limc'^fl*) = + ■\/'K(8*) 

uniformly in 8*, from Proposition 3 it follows that 


lim s P 


(43) 


>[A n (0*)|0*+ 

Vn. 


f A(S,> dN(l I 8*) - [” dN(l |tf*)}« 0 

J-ai J 


+V / X(ij*j 


uniformly in 8*. 


Now let us consider a sequence (v„} such that lim | p„ | = « and lim — 0. 

V n 



MOST POWERFUL TESTS 


407 


We shall prove that 
(44) 


p[^4„(0*)|r + J^] = 1 


r 3 2 t 

uniformly in 0*. Since Ee — 2 log fix, 6*) is assumed to be bounded, 

(45) iW,[|>g/(M*)] 


and 

(46) 


F r d * 

^ 0 *+(*„/+"> ^2 


log fix, 6*) 


are bounded functions of 0* and n. We get by Taylor expansion 

? ft log f(x - • 9 *> - ? Ss ,oe f (*• ■■ <l * + 7s) 


(47) 


where 0* lies in 


0*, 0* + -^ 

V«_ 


. Hence 


(48) 


— — V n -Ej'+On/Vn) ~ 23 1°S /(*« ! ®") • 


From Assumption 2 and lim | v n j = « it follows that the absolute value of 
the right hand side of (48) converges to oo. Hence 

lim | E e *+v n ,T/n[y n (0*)] | = 00 • 

Since on account of Assumption 1 

«/(*», e *)] 

is a bounded function of n and 0*, also the variance of y„(0*) (under the assump¬ 
tion that 0 = 0* + vjy/n is the true value of the parameter) is a bounded 
function of n and 0* Hence for any arbitrary large constant C 

(49) lim P [| y n (6*) | > C | 0* + = 1, 

uniformly in d*. The equation (44) follows easily from (36), (40), (41), (45), 
(46) and (49). 



408 


ABRAHAM WALD 


ri\ 

Consider a sequence {p n j such that > (i > 0 for all n. Then it follows 
easily from Proposition 1 that for any arbitrary C 


lim P | y*(6*) \>C\0* + ^ -1 


uniformly in 0*. Since lit lop; f{x „, 0*) is assumed to be bounded, and 
a* 

therefore also Jft -, lope/(.r, 6*) is bounded, there exists a finite n such that 
<70* 

(51) lim; '{!i?,W l0g/<J '”' S ’ ) | <!,|l ’* + V , n} = 1 

uniformly in 9* From (30), (40), (41), (00) and (51) it follows 


lim V 


A n (0*) j 




uniformly in 0*. Since on account of Propositions 3 and 4, the relations (43), 
(44) and (52) hold if we substitute for A n {0*) f Theorem 4 is proved 
If Assumptions 1-4 are fulfilled for the set w consisting of the single point 
0 = 0o, then we get from Theorems 1 -4 the following corollaries: 

Corollary 1. Let JI r l be (he region defined by the inequality y n (0o) > cl J 
Wl defined by the inequality ?/ n (ft) < c ’, and Jf ; „ defined by (lie inequality 
12/n(0ii) | > c„, where (he, constants c n , c, n and c„ arc. chosen such that 

P{W'n | ft) = 1VK f ft) - P(W n [ ft) - a. 

Then ( W ' n ) is an asymptotically most powerful test of the hypothesis 9 *- 0 O if 9 
lakes only values > ft , Similarly (IF*! is an asymptotically most powerful test 
if 0 takes only values < ft. Finally (Mb,) is an asymptotically most powerful 
unbiased test if 9 can take any real value. 

Corollary 2. The sequence, U„(ft)i w an asymptotically most powerful un¬ 
biased test of the hypothesis 9 = do, where A n (0 o ) denotes the critical region of 
type A for testing 0 = ft. 



ON THE DISTRIBUTION OF THE QUOTIENT OF TWO GHANCE 

VARIABLES 

By J. H. Curtiss 

Cornell University 

1 Introduction, Although the quotient of two chance variables appears fre¬ 
quently in mathematical statistics, the methods used in the literature to derive 
the distributions of quotients have usually been special ones devised for the 
particular variables under consideration, and in no way indicative of the general 
result It is the purpose of this paper to study the distribution of the quotient 
of two variables for itself alone, with attention first to the question of existence, 
and then to the accurate derivation of a number of general formulas for the 
frequency function and d.f. 1 The principal formulas which we shall derive may 
be described briefly as follows (the numerals refer to the equation numbers in 
the text). 

(3 1). The frequency function of the quotient of two variables which have an 
absolutely continuous joint probability function. 

(4.11), (4.12) The d.f. of the quotient of a pair of arbitrary independent 
variables, expressed in terms of the d.f.’s of these variables. 

(5,2). The df. of the quotient of a pair of arbitrary independent variables, 
expressed in terms of the c.f.’s 2 of these variables 

(6.4). The limiting form of the d.f. of a quotient of two sums of arbitrary 
identical independent variables. 

(7.1) . A formula analogous to (3.1) for the product of two chance variables 

(7.2) . A formula analogous to (4.11) for the product of two chance variables 

2. The existence of the quotient distribution. The function Z = X/Y is a 
continuous function of X and 7, finite and uniquely defined for all points 
(X, 7) such that 7^0. Therefore if P{Y = 0) = 0, the pr.f. 3 P{S ) of the 
joint distribution of X and 7 determines a probability distribution for Z (see 
[1, pp. 12-13]). To avoid irrelevant difficulties, we shall assume in the sequel 
that P[Y = 0} = 0 unless definite statement is made to the contrary. This 
assumption involves no real restriction on our work, for in situations in which, 
a priori, the assumption is not fulfilled, we can always replace the distribution 


1 1.e., distribution function The underlying axioms, terminology, and abbreviations 
in this paper are uniform with those of Cramer’s book [1] For the definition of d f., see 
U, P m. 

1 1 e., characteristic functions. See [1, p, 23] 

3 I.e., probability function, [1, o 9] 


409 



410 


J. H. CURTISS 


of Y by the conditional distribution of Y relative to the hypothesis that Y 0. 
In such cases, then, the distribution of Z which we are about to study is to be 
interpreted as a conditional distribution relative to this hypothesis 
We shall suppose that the space of X is the x-axis, that of Y, the y-axis, and 
that of Z, the z-axis, It is quite readily seen that the set of points in the (x, y) 
plane which corresponds to the set Z & z consists of 

(i) the infinite region 4 in the upper half-plane which is hounded by the nega¬ 
tive x axis and by the line x — zy; 

(ii) the infinite region in the lower half-plane bounded by the positive x-axis 
and the lino x = zy; 

(iii) the line x = zy except for the origin. 

Denoting this set by S,, we have 

H(z) = f dP(S) = P(S,), 

J B, 

•where H{z ) is the d.f. of Z. The present paper, from the viewpoint of analysis, 
is simply a study of the Lebesgue-Stieltjes integral appearing in this equation. 


3. The continuous case. Suppose first that P(S) is absolutely continuous. 
This means that the joint distribution of X and Y has a frequency function 
ip(x, y), which is defined almost everywhere, is non-negative, and has the prop¬ 
erty that P(S) = / i/>(x, y ) dxdy. In general, this integral must be taken iri 

the Lebesgue sense, but of course if the discontinuities of h> form a set of two- 
dimensional measure zero, and if the Jordan content of any bounded portion of 
the boundary of 5 is zero, then this integral is just an ordinary improper double 
Riemann integral. 1 In particular, these conditions are fulfilled if <p is continuous 
everywhere and if S = S,. 

The transformation x — uv, y = v, gives a continuous one-to-one map of S, 
onto a set of the (w, v) plane which consists of the closed half-plane lying to 
the left of the line u = z, but with the u-axis deleted. The Jacobian of the 
transformation has the absolute value | u | By the theorem for change of 
variables in Lebesgue integrals [4, pp. 653-655), we have 


H{z) = / y) dxdy — j y j <p(uv, v ) du dv. 
Jb, J®, 


By Fubini’s Theorem [6, pp, 203-208], the last integral can be expressed as a 
repeated integral, Integrating first with respect to v, wo obtain this result 
Theorem; 3.1: If the joint variable (X, F) has the frequency function <p(x, y), 
then 


/•* 

H(z ) = / | v | <p(uv, v) dv du, 

J— OO _ 


1 1 e., open connected set. 

4 See [4, pp, 476-478: p 575]. 



DISTRIBUTION OP A QUOTIENT 


411 


and consequently H(z) is an absolutely continuous function of z. The frequency 
function of the distribution of Z exists almost everywhere , and is given by the for¬ 
mula 

(3.D h(z) = F'(z) = f + " M V (zv, v) dv. 

•J—oo 

We remark that if X and Y are independent, so that <p(x, y) = f{x)-g{y), 
where f and g are respectively the frequency functions of X and Y, then (3.1) 
may be written in the form 


(3.2) h(z) = f \v\f(zv)g(v) dv. 

J—QQ 

This case was considered recently by Huntington [5], with the additional restric¬ 
tions that g(y) = 0, y < 0, and that f{x) and g(y) be continuous. 

All the familiar special quotient distributions of applied mathematical sta¬ 
tistics, such as Student’s t and Fisher’s z, may conveniently and rigorously be 
derived by means of (3.1) and (3.2); m each case the required result follows 
immediately after an obvious change of variables in the integrand. We pause 
here only to point out explicitly the result obtained when X and Y have a normal 
joint distribution with variances , <r* , and correlation coefficient p. If the 
means E(X) and E(Y) are not equal to zero, it is apparently impossible to 
evaluate (3.1) in closed form; this case has been studied in some detail by 
Geary [3] and by Fieller [2]. But if E(X) = E(Y) = 0, then 

AM = zifiZIY? —- ,1 ' , 

1 4(* - P+ *1(1 - p) 


which is the frequency function of a Cauchy distribution with mode at the 
point z — pa x j a r i value of the regression coefficient of X on Y. If X and Y 
are independent, then p = 0, and the frequency function becomes 


(3.3) 


h(z) 


<Tx<Ty 1 

2 2 , 2 ' 
71 Cy Z + <7 1 


4. The quotient of two arbitrary independent variables. We shall hence¬ 
forth drop the restriction that P(S ) be absolutely continuous, but shall suppose 
instead that X and Y are independent chance variables with one-dimensional 
distributions of the most general type, except that the distribution of Y will be 
subject to the restriction that P{F = 0) =0. 

We denote the d.f. of X by F{x), that of Y by G(y), and, as usual, that of Z 



412 


J. H. CURTISS 


by H(z). It is to be noticed that the condition P[Y = Oj =0 implies that 
G(y) is continuous at the point y = 0 . Let 

m = f e i,x dF(x ) 

J~OQ 


(4.1) 


g + (t) = I” e u ' dG(y ) 
A) 

£ 7“(0 - dG(y). 

J—OQ 


Clearly 

(4.2) #(*) = P{X - zF rg 0; F > 0| + PfX - zY 2t 0; Y < 0}. 

We introduce the functions 

fi(w) = P[X -zF <u; Y> Oj ^ (l - C?(0)]-P(X - zY g u | Y > 0},° 


7i (0 = r * u " <*ri(u), 

J—art 


P a (u) = P(zF - X £ U] Y < 0} = (7(0) • P{zY ~ X & u | Y < 0), 
(4.3) /•+• 


r,(0 - £ 


e“ u dr 2 («), 


r(«) = ri(«) + r 2 (u) 

t(() = jf e itu dT(u) = 7 i (t) -H yi(t). 

By (4.2) and (4.3), 

(4.4) 7/(z) = r(0). 

We shall now evaluate Ti(n) and r 5 (u) in terms of F(x) and G(y), and also 
7 i (0 and 7 i(i) in terms of /((), g + (t) , and ff~(Q. 

Let us assume for a moment that PjF > 0J ?£ 0; that is, that (7(0) < 1 . 
The conditional distribution of Y relative to the hypothesis that Y > 0 then 
has the d.f. 


(4.5) 


<?i(y) 


G(y) - G(0) 

" 1 —G(0)' * 

0, 


2/^0, 
y < o. 


The d.f. of —zF relative to this hypothesis is G^—y/z) if z < 0, and 

1 - Qi[(~y/z) - 0 ] if« > 0 . 


1 By P(A | b) is meant the conditional probability of the event A relative to the hy¬ 
pothesis b, 



DISTRIBUTION OF A QUOTIENT 


413 


It is well known that the corresponding d.f. of the s um X ■+ ( — zY) is given 
by a convolution of the d.f’s of X and (~zY ). 7 In the present case, this result 
takes the form 


(4 6 ) P{X-zY ^u\Y>0} 


f_ + ~F(u~v)dG 1 (-^, z < 0, 


z >0. 


Referring to the definition of these Lebesgue-Stieltjes integrals [4, pp 662-663], 
we see that the change of variables w = — u/z yields the equations 


(4.7) 


P{X - zY ^ u \ Y > 0 } 



F(u + zw) dG^w), z < 0, 

F(u + zw) dGi(w — 0), z > 0. 


Now the definition of the variation of G L (y) [4, pp 341-342] used in forming 
these Lebesgue-Stieltjes integrals makes no distinction between the variation of 
Gi(y) and that of 6 * 1 ( 1 / — 0) over any bounded set contained in an interval of 
integration a < y < eo, provided that G x (y) is continuous at a in the two-sided 
sense. Since G\(y) is continuous at y — 0 in this sense, it is possible to replace 
Gi(w — 0 ) by 6 *i(iy) in the second of the two integrals in (4.7). 

Equation (4.7) is clearly true for 2 = 0 as well as for all other values of z. 
Referring to (4 5) and (4.3), we see that 



F(u + zw) dG(w), 


all z. 


The c.f of the convolution (4 6) is the product of the c.f.’s of X and of the 

conditional distribution of —zY [1, p 36] This product is /(f) [ e~' lzv dG x (y). 

Jo 

Thus by (4.5), (4.3), and (4.1), 

( 4 . 8 ) 71 (f) = [1 - (7(0)] [ 7(0 ■ j[" e- ,! " dftfo)] = f(t)g + (-tz). 

We have established (4 7) and (4.8) under the condition that P (Y > 0] ^ 0. 
However, it is obvious that they are trivially true if P{Y > 0] = 0. 

We turn now to r 2 (u,). Supposing that P{Y < 0] ^ 0, the conditional 
distribution of Y relative to the hypothesis that Y < 0, has the d,f. 


G*(y) = 


G(y) 
(7(0) ’ 


y < 0, 


1, y £ 0. 


7 See [1, pp 35-36]; also [71. 



414 


J. H. CURTISS 


The conditional distribution of zY has the d.f. G 2 (y/z) for z > 0, and 
1 — G 2 [(y/z) — 0] for z < 0. The d.f. of — X is 1 — F(—x — 0). Thus 


P[zY - X £ u\Y < 0} = 

'[ +C ° {1 - F[-(u - v) - 0]| d [l - (h(~ - t)) , 

[1 - F[ —(u - v) - 01} dG Q, z > 0, 



z < 0, 


u — 0) dCr 2 (w). 


Evidently the first and last members of this equation are equal for z = 0 as well 
as for all other values of z. From (4.3) we obtain 

r 2 (u) = G[ 0) — f F(zu> — u — 0) dG(w), all s. 

J—cC 

Also, as before, 

7 *(0 = f{~l)g~{zt). 

Obviously, the last two equations are still true if P\Y < 0} = 0 
To summarize, we have shown that 

' F(u + zw) <IQ(w) - 

0 

(4.10) Y(t) - }(t)g\-*t) -h/(-t){f(*0- 


r° 

/ F(zu> — u — 0) dG(w), allz; 

J—oo 


Referring now to (4 4) and letting u = 0 in (4,9), we are able to state the 
following theorem: 

Theorem 4.1: If X and Y are independent chance variables with respective 
d.f.'s F(x) and G(y), the d.f. of the quotient X/Y is given by the formula 

(4.11) H(z ) = f?(0) + [ F(zw)dG(w) - F(zw - 0) dt?(u>) 

Jo J-bo 


for all values of z. 

We shall not attempt to make a careful study of the above formula, such as 
the studies which certain writers have made of convolutions. However, it does 
seem desirable to place on record here certain remarks concerning it of a more 
or less superficial character. For convenience in later reference, we state these 
remarks in the form of four lemmas. 

Lemma 4.1. Let Mi be Ike set of all values of z such that if z a Mi, the set of 
discontinuity points of F(zw) on the w-axis has a point in common with the 


point spectrum of G(w). 


Then if z e (7(Mi), 8 the integrals f F(zw ± 0) dG(w), 

Jo 


By C(Mi) we mean the complement of M , with respect to the z-axis. 



DISTRIBUTION OF A QUOTIENT 


415 



F(zw ± 0) dG(w), 


are RiemannrStieltjes integrals and consequently the inte¬ 


grands can be replaced by F(zw) without altering the values of the integrals 

The lemma follows immediately from the definitions of Riemann-Stieltjes and 
Lebesgue-Stieltjes integrals 

Lemma 4.2: The set M\ is denumerable. 

The proof can easily be supplied by the reader 

Lemma 4.3: Let M 2 be the set of all values of z such that if z e M 2 , T(u) is discon¬ 
tinuous at u = 0. Then M 2 C . 

To prove this statement, we first observe that T(u) is a genuine d.f, [1, p. 11]. 
For obviously T(— oo) = 0, T(+ «) = 1, and since ri(it) and T 2 (u) are both 
products of d.f.'s into constants, these two functions, and therefore r(it), must 
be continuous from the right It is this last property of T(u) which is needed 
for our present purposes; in particular, we have the relation lim u _, + «r( M ) = 
r(0) = H(z). On the other hand, by the general convergence theorem for 
Lebesgue-Stieltjes integrals [4, pp. 663-664], we have 


lim„__o r(u) = G(0) + f F(zw — 0) dG(w) — f F(zw) dG(w). 

JO j- CO 


If z be chosen so that this integral and the ones in (4 11) are all Riemann- 
Stieltjes integials, the expression (zw — 0), wherever it appears, may be replaced 
by zw without changing the values of the integrals. Thus for such a value of z, 
T(+0) = T( — 0). According to Lemma 4 1, we can be sure that at least if 
z e C(Mi), the integrals here will be Riemann-Stieltj es integrals, so our proposi¬ 
tion is proved 

Since H{zi + 0) is equal to r(-(-0) with z = Zi, and H(zi — 0) is equal to 
T(—0) with z = zx, we have the following result: 

Lemma 4.4. The set Mt is the set of discontinuity points of H(z). 

By using the alternate form of the convolutions used to derive (4 9), we obtain 
a representation of T( , w) somewhat more complicated than that appearing in 
(4.9). The corresponding formula for H(z) is as follows: 


G(0)[1 - F(—0)] - G(0)F(0) + jf° G(^jdF(v) 

~ I" G (l~ o) dF(v - 0), z < 0; 
(4.12) Hiz) = jj?(0)[1 - G(0)] + G(0)[1 - F(-0)], z = 

1 + G(0)[1 - FC—0)1 - G(0)F(0) + f <?(l)dF(v ~ 0) 


z = 0; 


z > 0 



41G 


3 . h. cuirass 


5. Representation of II(z) by characteristic functions., A simple algebraic 
formula connecting the c.f. of Z with those of X and 7 is not available. How¬ 
ever, there exists an interesting representation of II(z) in terms of the functions 
f(t), g + (t), and g~(t). The result may be stated as follows* 

Theorem 5.1:° Let the distributions of the independent variables X and Y have 
finite first absolute moments, and let the integral 

< 5 - i) a>r) i/( ' V( - 20 t /( ^ tz,)i - 

be finite for each value of z. Lei A (u) he, any tl.f. with a finite first absolute moment, 
and lei ^ j~ -f- / 'j j1 fit be finite, where 5(0 is the c.f. of A(u), Then 

(5.2) H{z) = A(0) - 1. [*"&?.'/(~ 0cr >0 r J(t) (jL 

2m J-oo t 

If the integral obtained by formal differentiation under the mtegial sign with 
respect to z m (5.2) is uniformly convergent, in a certain interval 1, then the 
frequency function h(z) of the distribution of z exists in that interval and is given 
by the formula 

Mz) = 2L’/„ -/(-Off*«/■ 


We remark that the condition (5.1) will be satisfied for all values of z if /(f) 
alone satisfies a similar condition, inasmuch as | g*{L) | S, 1, J jy (f) | g 1. 
Important special cases of the theorem arise when A(u) is replaced by F[u) or 
G(u), and when A(u) is so chosen that A(0) = 0. 

Our proof of the theorem will depend on a rather general result due to Cramer 
[1, Theorem 12], which we shall restate here in the special form applicable to the 
problem at hand. 

Lemma 5.1: Let R{u) be a function of bounded variation over the infinite 
interval — 00 < u < <w, let lim R(u) — lim R(u) — 0, and let r{t) — 

u oe ti~H so 


J e' lu dR{u). If (a) J | u | dlt(u) and (b) (j +j ) 


r(t) 

l 


dt, both are 


finite, then for every value of u, 


R{u) 


1 f + " 

2iri J^eo 


r (0 


llu 


dt. 


To prove Theorem 5.1, we observe that since r(-u) is a d.f, (sec proof of Lemma 
4.3), the difference r(u) — A (u) is a function similar to the function R(u) of the 
lemma. If we do let Ii(u) = r(tt) — A(u), it follows at once that r(t) = y{L) — 
5(0 = f(l)-g + (—zt ) + f{—t)g~{zl) — S(t). If we can verify that this It{u) 


8 The theorem is due to Cram6r in the ease in which G(0) => 0, end A(u) = 0(u). See 
[1, Theorem 16], 



DISTRIBUTION OF A QUOTIENT' 


417 


satisfies conditions (a) and (b) of the lemma, then we shall have established the 
relation, 

rfui = aiVi _ JL [ +x f(t)g + (-zt) +f(~t)g~(zt) - 6(f) Jj 
W 2«JL« l - e dt > 

for all values of u, and letting u = 0 in this equation, we shall obtain (5.2). 

. 1 ^ 0n (k) lemma is taken care of by (5.1) and the condition on S(t) 

in eorem 5 1. Clearly condition (a) will be satisfied if it turns out that r(w) 
las a nite first absolute moment. Now the existence of finite first absolute 
moments of X and Y wall insure the existence of finite first absolute moments 
for the conditional distributions involved in the definitions of T^u) and V 2 (u), 
because E \ X — zY \ Si E | X | + | 2 | E | 7 j It follows quite readily from 
this that the first absolute moment of T(u) is finite. The proof of the theorem 
is complete. 


6. Distributions of variable form. We consider now the case in which the 
distributions of the numerator and denominator approach limiting forms. 

Theorem 6 1: Let the independent variables X„ and Y p have respective df.'s 
F a {x) and G p (y) which depend upon the two parameters a and /3. Let H a , p (z) be 
the df. of the quotient Z a = X a / Y p . If there exist two chance variables X and Y 
with respective distribution functions F(x) and G(y) such that lim F a (x) = F(x) 

at all points of continuity of F(x), and lim G p (y) = G(y), at all points of con- 
tinuity of G(y), then 

(6.1) lim H a3 (z) = lim lim H a , B (z) = lim lim H a , p {z) = H{z) 

or—Hie 8—voo a— 

0—+to 

at all points of continuity of H(z), where H(z ) is the df of the variable X/Y The 
double limit m (6.1) is uniform m any finite or infinite interval of continuity 
of H(z) 

In the interpretation of the limits involved in this theorem, it is to be under¬ 
stood that in the hypotheses, a may tend to infinity over any unbounded set 
T a of the a-axis, and /3 may tend to infinity over any unbounded set T p of the 
/3-axis, provided that in (6.1), a and /3 are restricted so that a e T a and (3 e T p . 

To prove the theorem, we introduce functions /„(<), (t) , gf(t), 

r«,is(t), which are defined by equations (4.1) and (4.3) with F, G, X, Y replaced 
respectively by F a , G g , X a , Yp . On the other hand, with reference to the 
distributions of X and Y , we employ the notation of section 4 without modifica¬ 
tion. According to the work in that section, r(u) is given by (4.9) and its c f. 
y(t) is given by (4.10). Also, 

= ffif)gt(-zt) + f a {~t)gf{zt). 

But it is an immediate consequence of our hypotheses that lim f a (t) = /(f), 

Of—+00 



418 


j. h. curtiss 


lira = g + (t), and lim gj(t) = g ( t ), all of the limits being uniform in any 
finite interval of values of t. 10 Thus 

(6.2) lira y<r,/)(0 = lim lim y a ,g(t) = lim lim y a , B (t) = y(t), 

a —* oo 'a —* co /9-« /9 ~*pq a—*w 

uniformly in any finite interval on the d-axis. 

Consider the extreme members of (6.2). It follows immediately from a well- 
known general theorem 11 that lim r„^(u) = T(u) at all continuity points of 

<x~*n,p~*co 

r(u). Then since H a , B (z) = r a , 3 (0) and H(z) — r(0), we find that 

lim H a , fi (z) = H(z), . 

a-*oc Z € C 

fi-*ao 

where Mi is the set defined in Lemma 4.3. By Lemma 4.4, the set M% is the 
set of discontinuity points of H(z), so the equality of the first and last members 
of (6.1) is established at all continuity points of H(z). The uniformity of the 
limit is due to a general property of convergent sequences of d.f.’s, see [1, p. 31]. 

The existence and equivalence to H(z) of each of the iterated limits in (6 1) 
may be established by two consecutive applications of the foregoing argument, 
and by the use of (6.2). We leave the details to the reader. 

It is to be remarked that both H„,g(z) and U (s) can be represented by (4 11), 
provided, of course, that F and G in (4.11) are replaced by F a and Gg in the 
case of H a , g ; thus our theorem essentially states that the order of the double 
limit and the integration is immaterial in this formula. A similar remark 
applies to formula (6.2). 

The reader is reminded that we have tacitly been assuming that the d.f. of 
any variable appearing in a denominator is continuous at the origin. In case 
Gfi(y) does not satisfy this condition, but G(y) does satisfy it, and if, as suggested 
in section 2, we consider H a ,g(y) to be the d.f. of the conditional distribution of 
Z„,f) relative to the hypothesis that Yg ^ 0, then it can be shown rather easily 
that Theorem 6.1 remains true with this modified interpretation. But if G{y) 
is discontinuous at the origin, and if H (z) is interpreted as the d.f. of the condi¬ 
tional distribution, then (6.1) may be no longer true, os can be shown by trivial 
examples. 

Perhaps the most important cases of variable distributions arise in the con¬ 
sideration of sums of independent chance variables. We accordingly present the 
following synthesis of Theorem 6.1 and a simple case of the Central Limit 
Theorem. 

Theorem 6,2: Let Ui, Ui , • • • , be a sequence of identically distributed chance 
variables, each with mean zero and ( finite ) standard deviation a u , and let Vi , 


10 See 11, P- 30]. 

11 See [1, Theorem 11]. Tho result needed here is a trivial extension of the theorem 
cited, 



DISTRIBUTION OF A QUOTIENT 


419 


Vt, ,be a sequence of identically distributed chance variables, each with mean 
zero and ( finite) standard deviation o> . Furthermore, let the variables U, and V, 
be all independent, i = 1, 2, • • • , j = 1, 2, • • • . If m and n tend to infinity in 
such a way that 

(6.3) lim kf^~ = h 0, 

m-*ao f W 
n ~*oo 


then the d.f. of the conditional distribution of the variable 

w Vi + u» + • • • + u m 

m '" V 1 + V 1 + ... +7„’ 


relative to the hypothesis that the denominator is different from zero, tends uniformly 
to the function 


(6.4) 


j(w) = £ 


hov <ry 


_ 2 722 | 2 

v <Ty K U + ff 0 


du. 


For if we let 


Ui -H Ui -h • ■ • 4~ Um 
, _ avs/ m 

~ V1 + V2+ 'J • + F n ’ 
<xy\/n 


then W m ,n = Vm/n(a v /crv)Z m ,n The Central Limit Theorem [1, Theorem 20] 
states that the d.f.’s of the numerator and denominator of Z„, n each tend to the 

function £ (l/v / 2ir)e -<1/2 dt, which is the d.f. of a normal distribution with 

mean zero and variance one. By (3 3), the quotient of two variables, each of 

which has this d.f, has the continuous df. H(z) = / (1/tt)[1/(1 + * 2 )] dx. 

If we let H m , n (z ) denote the d.f of the conditional distribution of Z m , n , relative 
to the hypothesis that the denominator of Z m , n is different from zero, then by 
Theorem 6.1, lim H m , n (z) = H{z) uniformly in z. Now the d.f. of the 

m—*oo,n —*00 _ 

conditional distribution of Wm,n is H m ,n[v n/m{vv/vv)'w\) an< i because of (6.3) 
and the uniformity of the limit of H m , n {z), this approaches H[k{<r r /cu)w]. 
Differentiating the last expression With respect to w, we find that the resulting 
frequency function is equal to and this concludes the proof. 

As an application of the theorem, let us consider the following problem. 
From an urn containing white and black balls in the proportion of p to 1 — p, 
we shall make 100 random drawings of a single ball with replacement after each 
drawing. Let W Ml w be the ratio of the deviation of the number of white balls 
in the first 50 drawings from the expected number, to the deviation of the number 
of white balls in the second 50 drawings from the expected number. What is 



420 


J. H. CURTISS 


the approximate value of w for which .P{ Wm.m ^ tv j h] = 05, where the 
hypothesis b is that the denominator of TFm.m shall be different from zero? 12 

To answer this question, we observe that the numerator and denominator of 
Ww ,60 can each be expressed as the sum of 50 independent identical chance 
variables, each with mean zero and with variance p(l — p), Thus according 
to Theorem (5,2, the appioximato d.f. of Wwm is 

/•“ii 11 

J(w) = - - - 7 - du == •+■ arctan to, 

i 1 t «■ 2 7r 

and the required value of w satisfies the equation */(<=<?) — J(u>) = ,05. The 
solution of this equation (correct to one decimal place) is w = (5 3 

It is perhaps needless to remark that a study of the error involved in sup¬ 
posing J(tu) to he the d f. of IPO.n in Theorem (5.2, must necessarily precede the 
unreserved acceptance of numerical results obtained by means of that theorem. 


7. Products of chance variables. We conclude this paper with a rather brief 
treatment of the distribution of the product of two chance variables. To pre¬ 
serve a notation uniform with that of the preceding sections, we shall write the 
product as X = YZ, where the d f.’s of X, V, and Z are to be denoted, as liefore, 
by Fix), G(y), and H(z), respectively. The existence of /’V) is readily proved 
by the methods of section 2. The assumption that J*\ Y = 0) = 0 is of course 
unnecessary here, and will he dropped in this section 
In the continuous ease, an argument similar to the one employed m section 3 
will establish the following result: 

Theorem 7.1: If the joint variable O', Z) has the frequency function \f/(y, z), 
then 


Fix) 




and consequently Fix) ts an absolutely continuous function of x. The frequency 
function of the distribution of X exists alma&l everywhere , and is given by the formula 


(7,1) f(x) = F'ix) 


r 

1 

“•-to 

v 




dv 


n 

1 

v— ijO 

V 


( T,\ 

-- V, "• ) 

v \ v/ 


dv. 


In the discontinuous case, with Y und Z independent, we can write X = 
ZY = Zf(l/Y) and use Theorem 4.1 to derive a formula for Fix). We have: 

Fix) = P[X S *) = P{Y ^ 0 }P[X g, X ] T 9* 0} + P\X g XI Y - 0). 


11 TWb hypothesis would always be fulfilled in case 50p is not an. integer. 



DISTRIBUTION OF A QUOTIENT 


421 


Excluding for a moment the trivial case in which P{7 ^ 0) = 0, let Gi(y) be 
the d.f. of the conditional distribution of (1/7) relative to the hypothesis that 
7^0 Then 


P{Y^m^y) = 


g(-o) + i-g(1-o), 

<?(— 0 ), 

g ( -o)-g(1 -o), 


y > o, 
y = o, 
v < o. 


It is to be observed that Gi(y) is continuous at y = 0 Using Theorem 4.1, we 
find that 


P{X x \ Y ^ 0) — Gi(0) + f H(xw) dGi(w) — f H(xw — 0) dGi(w). 

*}Q J— eo 

So 


P{7^0}P{X;gx| 7^0} 


= G(-0> + [mm) d[~G (1 - o)] - £*«~ - 0) i[-G(i - o)] 

-'«-® ■+ C* (?)**> - C de «■ 

This equation is trivially true if P {7 0} = 0. Also, 


P{X ^ x; 7 = 0} 


[°. 

1^(0) — G(— 0), 


x < 0, 

x 0. 


Thus we obtain the following theorem: 

Theorem 7.2 If Y and Z are independent chance variables with respective df.’s 
G(y ) and H(z), then the df. of their product is given by the formula 


(7.2) F{x) = J^H (£) dG(v) dG(v) + 

for all values of x. 


l<?(0), 


x < 0, 
x 0, 


REFERENCES 

[1] H, CramIsh, Random Variables and Probability Distributions, Cambridge, 1937. 

[2] E. C Fieller, “The distribution of the index in a normal bivariate population,” 

Biomelrilca, Vol. 24 (1932), pp 428-440 

[3] R. C. Geary, “The frequency distribution of the quotient of two normal variates,” 

Roy Stat. Soc Jour,, Vol, 93 (1930), pp. 442-446 

[4] E. W. Hobson, The Theory of Functions of a Real Variable, Vol. 1, Cambridge, 1927. 

[5] E. V. Huntington, “Frequency distribution of product and quotient," Annals of Math. 

Stat., Vol. 10 (1939), pp 195-198. 

[6] H Kestelman, Modern Theories of Integration, Oxford, 1937 

i7J A. Wintner, “On the addition of independent distributions,” Am. Jour Math., Vol. 56 
(1934), pp. £-16. 



SOME GENERALIZATIONS OF THE LOGARITHMIC MEAN AND OF 
SIMILAR MEANS OF TWO VARIATES WHICH BECOME 
INDETERMINATE WHEN THE TWO VARIATES ARE EQUAL 

By Edward L, Dodd 
University of Texas 

1. Introduction. The logarithmic mean m of positive numbers, x and y, as 
given by 

m ___ y ~A __ - y ~ x 

logo V - log, x log, (y/x) 

is of considerable importance in problems 1 relating to the flow of heat. 

The logarithmic mean arises, moreover, in less technical problems such as the 
following: Given that incomes t in the interval, x ^ l g y, are distributed with 
frequency inversely proportional to l. That is, with k = a positive constant, 

(2) <f>(t) dt = (/c/0 dt 

is the number of individuals with incomes lying between t and t -j- dt Then, 
with * > 0, the total number / of individual incomes is 

(3) / = ( dt = fc(log y - log x). 

The combined income g of the group is 

(4) g = / t<t>(t) dt = k(y — x). 

And thus the loganlhmic mean g/f of the two numbers x and y in (1) is the 
arithmetic mean of all the incomes; that is, the average income —at least to a 
close approximation if the group is large enough that integration may leplace 
summation, 

Now m in (1) becomes indeterminate, if x = y, Nevertheless, if c > 0, and 
x~*c and y —> c, then m c, Thus, we may properly speak of m as a mean of 
these two variates, x and y. 

This logarithmic mean is one of a set of means studied by Ronsno Cisbani 2 , the 
general form being 


l 8ee Walker, Lewis, and McAdams, Principles of Chemical Engineering, McGraw Hill <fc 
Co>, Part IV, Logarithmic mean temperature difference. 

’R. Ciflbani, "Contributi alia teoria dellemedie,” Melron, Vol. 13(1938), pp. 23-34. 

422 



THE LOGARITHMIC MEAN 


423 


Jf+3 _ n x+l -U/X 

(5) 2 = _-_ - _ 

L(x/j + l)(fc’ - a’)J 

and the logarithmic mean appearing when x = 1, j —* 0 

In a chart between pages 28 and 29 Cisbani exhibits thirty varieties of these 
means (5) It will be noticed that 2 is indeterminate if a = b 

Some methods for dealing with means which may become indeterminate 
forms I have indicated in a recent paper. 3 

Now’ a generalization from a mean of two variates to a mean of three or more 
variates may sometimes seem to be immediate. However, for the arithmetic 
mean ( x + y)/2 of two variates x and y, the function [mm, ( x , y, z) + max, 
(x, y, z)]/2 is as much a generalization as is the arithmetic mean (a; + y s)/3. 
Actually, the direction in which generalization is to take place is arbitrary. 
However, it is natural to expect the generalization to arise from a problem 
somewhat similar to one that may give rise to the original mean And it is 
desirable that to the generalization should be carried over as many properties 
or characteristics of the original as is possible 

In the foregoing illustration, we considered a single interval x g t S y in 
which incomes are distributed in accordance with a relative frequency propor¬ 
tional to <£(/,). And the arithmetic mean of all these incomes was obtained as a 
logarithmic mean of the two range limits x and y, at least approximately, allow¬ 
ing integration to take the place of summation If <j>(t ) had been kt~ 312 , instead 
of kC 1 , then the average of all the incomes would have been the geometric mean 
of the two range limits x and y. 

To effect a first generalization, we shall now suppose an original interval to 
x n , to be divided into n submtervals by points x T such that 


( 6 ) 


Xo <,Xi < x 2 < • • ■ < Xn-1 < x n . 


For each subinterval x r _i to x r the same function <p{t) will be used to describe 
the relative frequency; but the total population for this subinterval will be con¬ 
trolled by a positive constant k r , m general different for the different subintervals. 
This may be described as stratification. To make this more concrete, let us 
suppose, as before, that 4>(t ) = k/t Then, with x a > 0, the mean M, which 
will be described more in detail m the next section, will take the form 


(7) 


M 


E" Mgr ~ Xr ~^ 

E" log (X r /X r - 1 ) 


Applied to incomes, M would, like m in (1), give average income. To get 
some idea of the significance of Ic T , let us imagine that in some community there 
are f r individuals in the income bracket x r _i to x T , say from $1001 to $2000. 
Let us suppose now that f r other individuals with incomes between $1001 and 
$2000 distributed m exactly the same manner move into this same community. 


3 “The substitutive mean and oertam subclasses of this general mean ”•Annals of Maih. 
Stat., Vol, 11(1940), pp 163-176 Seep 171. 



424 


EDWARD I,. DODD 


Then k r would be changed to k' r — 2k r . Hut, of course, among the entire 2/ r 
individuals the relative, distribution of incomes is exactly the same, as among the 
original f T individuals. 

In this interpretation k r is a weight for a bracket of items. But, taking M in 
(7) just as it stands, k r is the weight for the consecutive pair of lumbers a: r _i 
and x T . 


2. The first generalization. When t is in some interval, I — (a, a'), finite or 
infinite, let <j>(t) he a non-negative, integrabk* function of L 
And in 1 let the points at which <£(£) — 0, if any, form a null-set. Them, with 
I in I, write 

(8) <l>(t) = [ <j>(t)di. 

* a 

And, supposing that in (0), a < z<,, a K < a', set 

(9) fr= [ <h(Q dt - 4>(j:r) - <I 5 (x r „i); r ~ 1, 2, ■ • • , n. 

J *r-l 

Then/ r > 0; since <£(/) > 0 and is continuous almost everywhere in (ai r _i, 
x r ). Since in any finite suhinlerval of I, 14 >{ 1 ) is integrable, we may set 

(10) ¥(£) = f mdl » f 

J a J(L 

(11) Or - f dl = <&(x r ) - ^(a-r-i). 

Now, by a mean value theorem, there exists a number t' r such that 

(12) g T /fr = t'r , Xr_l < t[ < X r . 

Taking positive, numbers k r , the weighted arithmetic mean of g r /f r , with 
weights kfr is then 

m _ X" k r g T _ fc r [^(x r ) - 'k(xr-i)] 

Z? fc/r Zr krlHXr) ~ ' 

If </)(£) = k/t, this becomes the mean (7) associated with the logarithmic 
mean. Now, ainco for (13) the weights k,f r are positive, it follows from (12) 
that 

(14) x„ < t[ £ AT g t' n < x n . 

Suppose, now, that 6 lies in I, and that subject to (6) each x r —► b. Then, 
by (14), M —► b. And thus M is an internal mean of Xo, , • * • , i„ , although 

with the x’& all equal, M assumes an indeterminate form. 

In (13) the weights k t are applied to pairs of numbers, either to T(x r ) — SE^Zr-i) 
or to $(x r ) — 4>whereas in wosf weighted means, the weights are applied 



THE LOGARITHMIC MEAN 


425 


to individual numbers. We consider now a form equivalent to (13), but in 
which the weights c r are attached to the individual numbers. It seemed possible 
to get a more general mean than (13) by abandoning certain conditions upon 
the weights e, which first arose. But such relaxing of restrictions leads to diffi¬ 
culties, as will be shown. By setting 


(15) Co = -ki , c n = k n ; 
wc may write M in the form; 

(16) M 


C r — hr 


Vr-fl , 


Hn c r ^(£ r ) 
So c r $(x r ) 


r == 1, 2, • ■ ■ 


n - 1, 


On the other hand, if we choose c's subject to 


( 17 ) Co < 0, c, < — (c 0 + Ci + • • • + c r _ x ) for 0 < r < n, 

(18) c " = “ So"- 1 c r ; 

then positive k 's can be found to pass from (10) back to (13). 

The question arises whether if the conditions (17) are abandoned, and with 
the tv not all zero, (18) is retained as 

( 19 ) -i S? c r = 0; Some c r 0, 

M in (10) will continue to be a mean of Xo, Xi , • • , x„, possibly, an external 


Hitman* — 

It may be noted that the condition Zj c r - 0 arises from the fact that when 
parentheses are removed from (13), each k r is matched by -k r . 

By an example, it will be shown that under (19) alone, M in (10) may fail 
to be a mean. In (8) and (10) take a - 0. Then with n = 2, <p(t) = j, take 
Co = l, (q » -2, ci = 1 in (10). Then 

_ Xo - 2xf + Xa 

( 2 °) 2(Xo — 2xi 4- 2 j) ‘ 


If b > 0. 

( 21 ) 


Xo — 


b , 1 ) = Xi - 6, and £ = x s - b, then 
1 


M = b 4" 


2 e — 2?j + £ 


If now ij 2«, and £ « 3« 4- t, then 

( 22 ) M b 4* (2 4~ 0* 4* #*)/2 —* b 4~ 1 1 as c —•> 0. 

rlince M does not approach b here, when x$ , xj, and x» y b, in the manner 
specified, M in (20) is not a mean of xo, *», and **. 

W C may enquire, further, whether the function M in (16) could bo a mean if, 
discarding (13), (17) and (18), wc put upon c r the single restriction c r > 0. In 
that case, if x 9 < I < x* , then, since *(i) and ¥(t) arc continuous functions of 
l _gee (g) i (IQ)-- it would follow that if each x P -» l, then M —»But 



KmvMrn ].. noun 


126 


if .1/ is tn In' a mean of . 0 ,, xi , ■ ■• , x„ , then M / when each s r -> t Thus 
wo an* led to 'HO - M*l/) Except pm-dhly for points of a mill set, and 'HO 
lum* derivatives ^(0 and HO; and tlm> 


(23) ^(0 - -= /'!*'(/) f-'iH/i - 14>U) \■ 'MO. 


But thi’ii, sitin' HO - /HO M*i* i III) it would follow dial 'Im/) ' (I almost, 
everywhere in 7; Ind ( I>t/) > 0. if / > « Meiuv flu* .'f-stimpfion r, > (I Is mil 
sufficient to make the function iti (111) n mean of .r,i r„, 

In the simple ease of n - 1, M lieeomcs 


(2d) 


'I'fj’d - 'l'(jii) _ 
■l'fXj) — <I'(X,i) ’ 


and tills is a symnutricot function of r (t and x t . 

The question arises whether if « > 1. M in (13) or tl(i) can lie a .symmetrical 
function of r ( |, x 1 , • , x„. Assume, if possible, that with .r <. i/ < z, 


(25) 


IK / -0 T c,'I'(//) + c a T(;) 

' r '' '*■ ivHx) + rpl'fi/) + ivKO 


is a symmelrieal funelion of ,i, // and z. Now if ah --- r/r/, and h — rl H 0, it 
is well known (hat. a/h - fa — c)/(/> - d). 

Hence, if 7/(x, //, s) -- 7/(3, i/, x), and cn 5*/ r 2 , then 


(26) 


7/(x, //, 2 ) 


(Cn - Cj) l'l'(x) - ^(z)] 
(r« - f •) l*I'(:r) - <l'(z)]’ 


winch is not symmetrical in the three \ariable.s. Then II is not symmetrical 
in x, y and z, unless, possibly, when r„ = c 2 . 

Likewise from 77(x, ?/, z) = 7/(x, z, y), wo are led l.o the conclusion that II 
is noi a symmetrical funct ion of x, y, and z , unless possibly when cj = c*. But 
4 = Ci = r 2 substituted into (15) makes k t = la — 0, wliieli is contrary to 
hypothesis that k r > 0. Then in (25) the constants c 0 , c\ and r 2 can not be 
chosen in conformity with (15) so as to make 77(x, y, z) a symmetrical function, 
of the three variables. 

Bymme.try in two variables will appear, however, if the mean (13) reduces 
to a mean of just two variables as it does when each k, — k, constant, in vvhicli 
case, 


(27) 


„ T(x„) - T(xo) 
d'(x„) - d*(x 0 ) ’ 


Although in the generalization (13) symmetry is thus lost, another property, 
homogeneity is retained in what seem to be the. most important eases. 

Most means fi(x, y, ■ • • , w) in common use are homogeneous functions of their 
arguments. That is, if c is a constant, and 9.{x, y, •> • , w) and Q(cx, cy, • * • , an) 
are both defined when x,y,-<-,w lie in some interval J, then 



THE LOGARITHMIC MEAN 


427 


(28) a(cx, cy, ■ ■ , cw) = cO( x, y, ■ ■ , w). 

This homogeneity is associated geometrically with ruled surfaces, in particular 
with cones, 

With reference to (8) and (10), let us write 


(29) 


F( X v) = 

k W *(») - #(s) 


And now, let us consider a special variety of means obtained by taking m (8) 


(30) 4>(t) = f, 

where q is any real number Then F(x, y ) is a homogeneous mean; that is, 


(31) F(cx, cy) = cF(x, y). 

This is valid, indeed, even in the special cases, q = 0, —1, and —2, which lead, 
respectively to the arithmetic mean, the logarithmic mean (1) and to a second 
variety of logarithmic mean 

(32) 

exhibited by Cisbani It may be noted that q = -3/2 leads to the geometric 
mean, and q = —3 to the harmonic mean of x and y 

It is conceivable that for </>(t) other functions than t 9 —functions not equivalent 
to f m integration—might be used to lead to a homogeneous F(x, y ) in (29). 
But such functions, if any, would hardly seem to be in common use. 

The M in (13) retains the pioperty of homogeneity, at least for = t 4 ; 
and so will also the more general means exhibited m the next section. 


3. Further generalization. The means of Cisbam (5) suggest the following 
generalization. Let p be an integer or the reciprocal of an odd integer. With 
the notation of (13), take h T > 0, and 

(33) F v = k r fr, Gj, = 22i k r g ?, 

(34) M p = [G p /F p ] up . 

Indeed, if in (8) and (10), a ^ 0, then g r > 0, and we may take for p any real 
number except zero. Now, M p v may be described as the weighted arithmetic 
mean of {(jr/frY with positive weights k T f T . And hence M P is an internal mean 
of xa, Xi, ■ • ■ , x n ; that is 

( 35 ) = Mp ^ X n ■ 

Furthermore, if in (8), <f>(t) = t q , where q is any real number, then M p is a 
homogeneous mean of Xo , ati, ■ • ■ , x n . 



428 


HOWARD h. DODD 


Another generalization may he obtained by writing 

( 36 ) m T ® Qr/J ,, 

(37) M'p - [Z k r m?/2k,f' p . 

And still another 

(38) M" « [mV -mV • ■ - mVl ,,z * r . 

These means (37) and (38) are internal; and they are homogeneous, if F(x, y) 
in (29) is homogeneous. 

The foregoing means are not, for n > 1, symmetries! functions of 
x\, xi , *•*,£„. Now the mere abandonment of (6) may lead to functions like 
(20) which are not means at all. But symmetry may be introduced as follows. 
First, lay aside (6), but suppose that the x r are all different. Then let 

(39) / ri , = f g fl , » [ dl; 

Jxr 

whore r = 0, 1, ■ * * , (n — 1); r < a £ n. Then, let 

(40) V m Zfl ,, V=xgl,-, 

where U and V is each a sum of n(« — l)/2 terms: Let IF lie the double-valued 
moan 

(41) W - ±(F/(J] wa . 

Then W is a symmetric function of x D , x t , • • * , x n . If, in (8), a' g 0, then 
in (12) each g r /f r < 0; and in (41) the negative value of IF is an internal mean. 
But the positive radical is external. On the other hand, if a Sr 0; then g,/f T > 0; 
and the positive radical in (41) is internal. In this case, it may be well to use 
for IF only the positive value of IF, 

In the more general case where a < 0 and a' > 0, the fractions g r /f r may have 
different signs. But, in all cases, at least one of the two radicals (41) is an 
internal mean of x 0 , Xi, • ■ ■ , x „. Moreover, IF is homogeneous, if in (8), 
0(0 = t\ 

Finally, let 

(42) Pl rk i ~ <?r,i//r,« , 

(43) Z - ±([2mU/n(n - l)} m . 

Then Z is symmetric; and at least one value is internal. If a > 0, wo would 
naturally take Z > 0; and this Z is then an internal mean. Moreover, Z is 
homogeneous if the m ri , are homogeneous; that is, if F{x t y ) in (29) is homogene¬ 
ous for every x and y in I. 



A STUDY OF R. A. FISHER’S z DISTRIBUTION AND THE RELATED 

F DISTRIBUTION 1 


By Leo A. Aboian 
Hunter College 


1. Nature of the problem. Consider two samples of Ni and Nt drawings, 
each sample drawn from one of two populations consisting of variates normally 
distributed with equal population variances <r 2 . We define the two sample 

£1 Wa 

E Si E 

means Xy = , it = , x t ’s and x/s independent variates. We calculate 

from the two samples 


ffl If 2 

E (&» ~ *i) 2 E ( x i - £2)* 

si = id- and s' = ,_1 


n v 


nt 


n t = Ni — 1, W2 = Nt — 1. 




The distribution of z = £ log - 1 is well known. 

52 


(l.D 


P(z) = 


2nl ni nl n * 




B 


/ «i nA (nie 2 ‘ 

V2 1 2j 


+ „ 2 )lln l +n ! ) 


dz. 


We shall denote the ordinates by y(z). The purpose of this study is to discuss 
the seminvariants of the z distribution and also to find useful approximations 
for them; to show that as ni and n 2 approach infinity in any manner whatever 
the distribution of z approaches normality; to find the upper bound of the ab¬ 
solute value of the difference between the distribution function of z and the 
function determined by the approximate seminvariants of the distribution of z 
for n y and nt large; to approximate the z distribution by the Type III distribu¬ 
tion, the Gram-Charlier Type A series, and the logarithmic frequency curve; 

and finally to investigate the same properties with respect to the F distribution, 
2 

where F = e 2 ‘ = . The non-existence of the moments of F for certain values 

si 

of ni and nt is noted and explained on the basis of the distribution of the quo¬ 
tient -. 
x 


1 Presented to the American Mathematical Society, September 10, 1938, New York City 
in part; and to the Institute December 27, 1939 at Philadelphia. 

429 



430 


I.KO A. AHOIAN' 


2. General features of the z distribution. Tin* z distribution is always uni- 
modal, asymmetrical if «, 5 * m , anrl symmetrical if m ~ » 3 . We see that 
interchanging m and n 2 is the same as replacing z hy -a. Fisher |7| noted that 
the two parameter family of curves includes us special case- the mumal curve, 
the x distribution, and Student's distribution. The mode is at z - f), the 
maximum ordinate is 


or approximately 


1/(0) *> 


2n|* l ni* 



('ll f Hj) 


J ,, K| 1 * 33 } 


( 2 . 1 ) 


1 /C 0 ) - 


1 J.V 1 + l Yl ! 

V2a- \2 \u, ih/j 


for m and «j large. 


The two points of inileetion are 

(2 2) 2 =s \ log ^ 7,1 n ‘ n ' ’** n ' 2 "k ^'1 + «» + 2;i? ih 4- 2»i n 2 b 2m n s 

' \ Hi Hz 


They are equidistant from the. mode, a properly also of (lie Pearson system of 
frequency curves [24], Also lim z"^ 0. 

i«"+ Lm C/2 


3. The moment generating function and seminvariants. The moment gen 
crating function of the z distribution is 



The .seminvariants of Thiele are defined by the following identity in 0: 


(3.2) 


lOg Mx(fl) = \l0 + + Xj jj j + X* 4* 


To find X r we take the logarithm of the moment generating function, expand it 

0 r 

in powers of 0 and choose the coefficient of . A complete discussion of proper- 

rl 

ties of seminvariants may lie found elsewhere [4], 


4. The seminvariants of z, Now by the following formulas [11] p. 38: 
(4.1) log r(l + *) = 4 - 4 - 8 -f- -, ] as | < 1, 




FISHER’S a DISTRIBUTION 


431 


(42) logr(l -x) = SlX + s ^ + s ^f + s -^+ 
where in both formulas 


x | < 1, 


Si = lim (l + i i + i + • • ■ + i - log «V 

n -»co \ a O 4 71 / 


1 i 1 i 1 . i 

S " = r „ + 2 n + 3-„ + |-n + 


n ^ 2. 


2 a 4 

log B(§[1 + x], I) = log ir — a x x + cr 2 ~ <r 3 ^ + <u j 


where 


1*1 < 1, 


a = I - I + I- 1 + 

n Jn 2 n ' 3™ 4" ~ 


n 1 


T - “ { l - 2^i) s "’ n = 2 ‘ 


Hence from (4 1) and (4 3) 


log r (~--) = 4 log tt - * (<ri + |) + | (cr 2 + |) 

^ 1 ( ff3 + l) + | ( ff4 + |) “ 

Since <r n = ^1 - s n , n S 2, we may write (4.4) as 

(4 5) log r ( 1 ~^) = * l °g * - * (<n + §) + g i —j JL (l - < 


From (3.1) 


log A/»(0) = log r 


(l-f) + ,o e r(^) 

+ (log »■ - log »i) - log r (T) - log r ^) 


The results assume slightly different forms for (A) n\ and ni each even; (B) ni 
and n 2 each odd; (C) m even, n 2 odd, (D) m odd, n 2 even The general formula 
for X, z for all cases is 



432 


LEO A. AROIAN 


(4 7) x = ± /(-w- 1 )' + <ir») A r , 2 

(4.7) X r „ 2w^ (m -f 2k) r + (n a + 2fc) r /' 

This result is not so useful from the point of view of numerical applications as 
the formulas which follow. 

6. Case A, m and n % each even. From (4.6) 

log r(^~) - log (- : j-*) + log - | + •" 


+ l<>B (* ~ I) + r (' -1) ■ 

Now log (l - J—Tg) - ~£ i (*-L i) ■ Ther '' wiU bs > f - 1 » eriCT 01 

t V t 1 I. 


this sort, and only one series of the type log I' (fy as given by 

(4.1). In the above expansion and those succeeding, terms not involving 0 are 
omitted, since such terms are not needed in finding the seminvariants of z. The 

series log I' ^1 — 0 will always occur. Then 

>-.r(v)-s; [fcL)'-(»L,)'*■■■ 


VIA 

( , 3) >o,r 

We remark that the double sum is zero if ih — 2. Similarly 


V* 

By use of (5.3) and (5,5) we have for the sominvariants of z, when n y and ns 


are even 


(5.0) x„ _ i) + - ‘"£‘ A)}, 


2, 



fisher’s z distribution 


433 


For Xp* = zwe have by (4.6), (4.3), and (4.5) 


X,. - a-[(k*- - ‘g j) - (ios». - g 0_ ■ 


6. Case B, n\ and n 2 odd. We have 

Jog r (!i^- # ) - tog ■) + tog ( 


r n 2 — 4 — 6 X 
v. 2 , 


+log (V e ) +logr ( L ^ 6 )- 


Expanding log r 


Chf -0 


by (4.5) 


, Tv ? m &k j. .., + 1*' 

° g r \ 2 ) Lw fc(«s - 2)* + fc(n 2 -4)* fc 


+*(-+ 5 ) + SK i “iV- 

However s^l -^)=^ + ^ + ^ + ^+ ‘ * fc>1 > which we sha11 denote 
hereafter by t k , Hence (6.2) becomes 

<*» V, r (^) - . + S) + g J? u ■- g i “g (a^y. 


tog r (”-l±- 8 ) = tog (!^i^- 2 ) + tog (*±|=i) 


+ y* (-D*~ T 0* 

0gr V 2 / £ fc L(m- 


+ 10E(L±-») + l° g r (l + i). 

P<-i)‘-T «* , «* . »‘l 

U l + («! - 4) A 1 j 






. ^(-i) k e\ , e k _ 

+ h~r— tk+ fci~T~ fa (22 + i)*. 


Combining both these results (6.3) and (6 6) we have 



IiKO A. A1UU AN 


\r , = (r ~ 1) 


( K " I -31 1 \ 

l, ~ § (2k ir)/’ r: 
/. K«a-a> i \ /, i \ 

§ 2A* H- 1/ G lf ' K Ui h 2t + l) 


r > 2. 


(0.8) ^-1-^1 "K«.- Z 2fr+ J" U l,,K W1 " ft 2* + J 

7. Cases C, D, and values of «*, a* , 4 . The formulas for ease (\ ii\ even, 
H s odd arc 

(in \ rsj. 


n-i odd arc 


(/ 

J (n s 3) 

(7 1) X,,«(r- 1)1^4 

- z 

A«nfl 

(7 2) Xi„=:s = ilnK , ' s -i 

A 7ll 

i 1 ’AJl; 1 

‘ £ , 
A i 


V' I I. .£_/ of. I 1 

.6 7J| jJ f,«| A h«ii <sn "I* 1 

The results foi ease I), ?<i odd, ffa even are 


(7.3) X r ., = (r 




>. r k 2. 


T ■>! 11 *** I ' 3> i 1 i 1 I 

(7.4) X,., = a =* g lo ^ n , ~ ffl + £ 2fc + 1 ” 2 £ fc‘ 

We list the numerical values of «t and , A - 10. The* values of «* are from 
Htieltjes [20], 


(7.5) »! - 0.57721 50040 
si = 1.04403 40008 

83 = 1.20205 09032 

8 4 = 1.08232 32337 
s 6 = 1.03092 77551 

St = 1.01734 30020 
s r = 1.00831 92774 
s a = 1.00407 73502 
So - 1.00200 83928 
s,o - 1,00099 45751 


(7.0) nr, - log 2 0419317 0200 

/a »b 1.23370 00550 
4 « i.05179 97903 
4 ~ 1.01407 80310 
4 = 1.00452 37028 

4 = 1.00141 70707 
4 “ 1.00047 15187 
4 = 1.00015 51790 
(a - 1.00005 13452 
4o «■ 1.00(H)! 70113 


Ry means of the formula 4 = s^l — k > 1, 4 won calculated from H k . 

From the, well known results for the Zola function of Riemann f(s), [22], (p, 205, 
P. 207), 

( 7 . 7 ) r. - SJt = £ ~ = ~ f Ski, k > 1 . 

i-i k r(s) Jo 1 — e * 



fisher’s z distribution 


435 


(7.8) „. - (l - i) rtr) = jL [ ^dx, and 

(7.9) I. = f(s) (l - i) . 

8. The mean of the z distribution. From our previous formulas for z we 
prove that if n\ = n 2 , z = 0, and g < 0 for m 2 > ni, I > 0 for ni > n 2 The 
maximum absolute value of Xi, 2 will occur when ni = 1, n 2 = 00 , or ni = 00,715 = 1, 

and from (7.4) or (6 8) we have max | 2 1 = - + j log 2 = 6352 

9. Formulas for X 2 , 2 , m *, X 3 *, /u 3 , 2 , X 4 z , and ^t 4 *. We have four cases from 

(5.6), (6 7), (7 1), (7 3): 


(9.1) 


X 2 . 2 = i 

4 


n 1, n 2 odd. 
even, n 2 odd. 


n j even. 


1 1(712-2) , “I 

■- S h- 5 »] 

1 /Km—2) 7 1 (71 2 —2) H \ 

= .822467 -^ g g -J, ni,n 2 even 

1 /Km—3) i Km—3) \ 

(9.2) X.„- 2.407401 -!( g g ^). 

1 /Km—1) 1 Km—3) 1 S. 

(9.3) X..= 1044934-j( g p+ g (MTT.)' 

(9.4) X., - 1 044934 - j ( g jf + jp + g s) •' * ° dd ’ 

In all cases of course X 2 2 > 0 and moreover X 2 * —» 0 as n\ and n 2 —* 00. We list 

^ zim-i 2 l m - 1 j\ 

(9.5) p~ g p). ».,«.6ven. 

1 /Km—3) 1 Km—*) 1 \ 

(9.6) *.,-*( g (4+1F-- S (¥+}?)- " , ’" ! ° dd ' 

1 /Km-*) 1 Km- 8 ) 1 \ 

(9.7) X 3 . 2 = 1 803085 + J ( £ i - £ - , 1 ., ), m even, n 2 odd 

4 \ a -—1 fc 3 A-o (fc + J) V 

■j / 4 (ni—3) -j i(n 2 — 2 ) -i \ 

(9 8) Xa, 2 = -1.803085 + =( £ 7=-=-^- uh «i odd, n 2 even. 

4: \ /{mnO I 2/ ^““*1 / 

o /4m — 1 1 I"!—I 1 \ 

(9,9) X< ; , = 811742 - g ~+ g n u n 2 even. 

( Km—s) -1 Km—3) 1 \ 

S (2r+U‘+ S (2 F+t?> n ""’ odd 



436 


UKO A. A KOI AN 


(9.11) h,, = 6,493939 - 6 ^ j£ + 1)1 + ^ k*)' CVPtl > ”* odd 


/inj-S t l(ni~3) 

(9.12) X 4; . ~ 6,493939 - 0 ( £ „ + £ 

\ j.^1 ft. 


74 odd, 7 h oven. 


IM-.I ► '' y 4U-J £4 1 X-jf 1 )V '' V CU. 

We see X r « > 0 whenever r in even. IF r in odd X r .» < 0 if n% > n,, and X r » > 0 
if 7 ii > 7i\ . Also i* r !t > 0, Ki > 7 ij, r odd, greater than one. Similarly n r ., < (), 
r odd > 1 , tti > «i. 

10. Skewness, excess, end values of «„. We take for our measure of skew¬ 
ness a, = “j/j = For iii > tii , a» < 0. Further the skewness increases 

Pj Aj 

negatively if ni remains constant as vi ». Thus negative skewness will be a 
maximum for n* = °o, ?i, = 1 , and positive skewness will be a maximum when 
Th = 1 , «i = «>. The absolute value of maximum cvj is 


( 10 . 1 ) 


I a » i — ji/t 


1.6351. 


As our measure of kurtosis we use a 4 — - 3 + ;-J. As a measure of excess, 

Ki Xi 

E, we use E = « 4 — 3 = :'i. The excess is always positive. 

At 

11. Approximations for X,., by the Euler-Maclaurin sum formula. The exact 
results given previously for the seminvariants become unwieldy for m and n% 
large. Hence we develop useful approximations for the seminvariants, and give 
the maximum error of the approximation. We find first our results for X r;a 
when 7 i, and n a are even and r > 1 . We begin with (5.6) 

and rewrite this as 

(11 ,) + 


Now find the two sums of (11.1) by the Euler-Maclaurin sum formula [ 21 ] 
using the first three terms, and obtain 


X r: , “ 


<r — 2) I 


( % + r " 1 

n5 


( 11 . 2 ) 


; + (“ 1 ) 
r(r - 1) 


ni + r 


Cp 


r(r - l)(r + l)(r + 2 ) 
45 


* _l ZL 

„r+J ” — r+J 

ni 



437 


fisher’s 2 DISTRIBUTION 


We use the following theorem [10] (p. 539), to find the error: 

If f(x) is of constant sign for x > 0, and together with all of its derivatives, 
tends monotonely to zero as x —» oo, Euler’s summation formula may be stated 
in the simplified form 

E/* = / /(*) dx + |(/„ + /o) + (/» — /o) + ■ ■ • 

iZ*«0 »/Q £ J 

(-1 ) t “ 1 5 ! 


+ 




(2k )! 


0 B 


2fc+2 


(2fc + 2) 




(2AH-1) 




where 0 < 0 < 1 and = 1/6, = 1/30, B e = 1/42, B a = 1/30, B w = 5/66, 

etc. If we use 


(11.3) 


Xr.z — 


(r - 2) 


+ ( -i r si±4ni\ 

\ n 2 ni / 


then the error committed is of the same sign and less than 


ifLx (-ir 

'+1 


r 

8i\»i 


nj' 


If we take 


(11.4) 


V. = [’( - + ■;- 1 + (-!). ?■ + ' - A 

* L\ n 2 n\ / 


r(r - 1) /JL , (~l) r V 
3 Vn5 +1 nJ +1 /J’ 


then our error is less than, and has the same sign as 

(r + 2)l/ 1 (-1/ 

90 U +s ^ n ;+» 

Finally if we use (11.2), our error has the same sign as, and is less than 

(r + 4)1/J_ (—I)" 

I r+5 ' 


945 ur 


„r-H> 

Hi 


12. Approximations for other values of n 1 and n 2 , r > 1. Now in case n\ 
and n 2 are odd we have from (6.7) 


(12.1) X r .» = (r — 1) 1 


i-Knj-i) (2k + 1)' 


+ (-D r E 


A-Kni-u (2 k -b 1)' 


Appljdng the Euler-Maclaurin sum formula to each of the sums in (12.1) we 
are led to exactly the same results given in paragraph (11). The other cases 
are obvious combinations of the sums in (11.1) and (12.1), and so for all values 
of ttj and «2 the approximate results for A r:t , r > 1 are 



438 


I.EO a. AUOIAN 


X r , = ■ 


(r — 2 )! f?i 2 + r — 1 


(12 2 ) 


?i 2 


+ (-l) 


r 7h 4- r — 1 


T « |„HM rt-1 f 
\Tli Til ) 


(r + 2)!/ 1 , (- 1 )' 

“ 00 \nV ‘ + 


„ r 13 
711 


Formulas (11.1) and (12.1) prove tlm result previously given for X,,* (4.7). 


13. The approximate values of X lu , From (5.7) 

X..= \ [(i°g». - "g't) - ('»k - - "r l ) 1 

We use the Euler-Maelaurin sum formula on the sum 

(nj.-l 


{n f:' l __ f 1 ^' / l \ _ 2) 
fa k ~~ \ fa \k + \) nlj 


and the similar sum involved in Xt.,. lienee we have. 

(13.1) X,, - 1 (i - A) + * (- 1 , - \) - 1 (\ - 1 ), 
2 \ni 7ii/ 6 \fi 2 Hi/ 15 \rt a u,/ 


?ij and n 2 even. 


?u, > 2 . 


The errors committed by using one, two, or three terms of (13.1) are less than, 
and of the same, sign respectively jus 



For 7 M and ?i 2 both odd we find the same, result as (13.1). The restriction n i, 
n 2 > 2 , may easily be replaced by Hi, » 2 g 2 (for n \, n 2 even) and ?ii, n 2 gt 1 
(for m-, n% both odd), When ?h is odd, n 2 even, the formula is again the same 
as (13.1) if rii and n 2 are sufficiently large; hut if >u and ?i 2 are small we find 
in this case 



Another method of finding (12.2) would have, been to use the asymptotic ex¬ 
pression for log r(m). 


14. Approximate values of X rl , for values of r. Wc list the approximate 
values of X r ', to three terms. 




fisher's Z DISTRIBUTION 


439 


(14.1) 


A 3 


A-i 


i = ( 

- 3 ( 


n 2 + 1 1 n x + 

2 "T" n 

n 2 

712 2 


Wi 
Tii + 2 


7l 2 

ti 2 -h 3 


+ 


Til + 3 


n 2 
Tij + 4 

7J 2 

r ii 2 + 5 


Tli 

Til+ 4 
n\ 

Til + 


/ 3 Vn| ^ nl) 15 Vn? + n\) 

/ V4 n?/ 3 \n 2 Hi/ 

) + 4 (nl + nf) “ 8 (3 + 3) 

) + 20 ( 3 - 3 )-“ ( 3 - 3 ) 


_ U (S+« + U+- 5 ) + 120(I + i) -448 (l + I) 
\ Tl 2 Tl! / \»2 Til/ Vn 2 71?/ 


The approximate values given by Cornish and Fisher [8] (p. 319), are similar, 
but have fewer terms. Cornish and Fisher give no remainder term From 
(14 1) and (12 2) we see the maximum absolute values of X 2r+1 , I , r ^ 1, occur 
when n« = 00 , m = 1, or 7i 2 = 1, toj = ■*>, Similarly \ 2r *, r ^ 1, has its maxi¬ 
mum value foi ?ii = n 2 = 1 The standard semmvanants of z are defined 

£r. 2 = ,r| 2. We also note that for n 2 > n x , £ 2r+ i * < 0, r ^ 1 and hence 

a 2 

a 2r +i < 0 also where a n — Moreover the maximum absolute values of 

Ms 

fer:z and £ 2r+ i z occur when n L = 1, n 2 =■ « or ti 2 - 1, ru = » , and also for a 2r 
and a 2r+i . Approximately then 

(14.2) • max = ( -l) r , r ^ 2. 


The exact value for maximum 014.2 is 3 



7.07. 


15. Approach to normality of the z distribution. We prove the theorem: The 
distribution of z approaches normality as ni and n 2 -> ® in any manner what¬ 
ever, with § = -( — — —) ,o\ = ~ (— + — ) We also find an upper bound 
2 \Ti 2 n x ) 2 \n, nj 

of the absolute value of the difference between the z distribution and the func¬ 
tion determined by the approximate semmvariants of z when 711 and n 2 become 
large. To prove the theorem we start with the original distribution of z, and 
find when n x and n 2 are large, 


(15 1) 


P(z) = — / -^±^r i+n5) e" 1 ’ dz 

\/2tt <Tz \ni e u + 7i 2 j 


We change to standard units z = Ur, + z, then 



440 


I,KO a, AROIAN 


(I52) w-jslspZ-h}* . v '.** 


» < l < oo. 


We rewrite this as 

h ,o\ P m=J / ni + Th V tn,,, ’ 5, * 

{10,0) 1(1) ^ fljg -2n 1 O.U./( , . I * nj )J “ c - 

Expand nLC Jn,( " +,,,(niMtl and •»>“"“ "»> and add term by term. Divide 

this result by m + m from the numerator of P(l) to obtain 


(15.4) 
Hence 

(15.5) 


1 _L + *Y t n 

1 + " (n t -f rh) 1 + 0: 


F(0 = 


\72i 


1 


1 {(ni 4-n*)*} ‘ 
2nin a (tff + g) 5 V' t(ni,n ” 


(Wi 4~ Tta) 3 


rib 


We evaluate (15.5) for m and « a large by using logarithms. 


m 4- ft-i 


log U 4- 


2ni nj(hr + z) J 
(n i + n*)* 


+ n% 
2 


! ,[ 7 2 n 1 n 1 (kr + 5) a \ _ 1 f2n»- 
Ll («i + na) J / 2 \ (r 


n 2 (/tr 4- 2) 

4- na) J / 2 \ (n, 4* ns) 3 


At 


(p (—T) rtl / 2 njnj(/tr 4 - 2 ) 

\ (m 4* m) s ” 


4- 22 *• 

r-> r 


n 


This gives 

- £ «v+ 2 m+*■) + «„ + «r +£ (-D- ■ 


2r(ni 4- n I ) Jr " 1 


We reduce this then to 


_ f _ -i-, _ (2v ~‘) a , 1 / 2nln$ \ (l a + 2)* 

2 2 2 \(iu + 7h) 2 J Hi -f* ivi 

4- terms involved in the above summation. Let U = a~'i < a. Since 


2 J tr * 


lim a = 0, lim V «= Q. Similarly lim 

8 ider,-riri_ (1 ,+ »+W 


f / 5 


(ni +n 5 ) 3 
0. In like fashion 


= Lim ^ «0. Con- 
__ _ _,_____ Hence lim 4* t/) ^ 

4(ni4-n5) 4(7114-7^)’ ni,«s~»4(ni4-7ia) 


V (rlT ( V (iv 4- i) tr = J (-X)V~ ir (hr4-g) ir 
£a 2r \ni 4* 7hJ (Tii 4-7k) r_l £a ~ 27(713 4- 

Now clearly from our previous discussion for r = 2, we see 



fisher's z distribution 


441 


am i tn ^ ■ o. 

nj,nj -»00 r —8 27* (?li -|- 7lj) r—1 

This completes the proof. 

We now consider the function, f(z), determined by the approximate semin- 
variants of z. We start with 


Xiu — 



and x K ..^2i ! {^±pI + (-i)'*±LZ_ 1 \, 
z l n 2 n\ j 


r> 1, 


from (12 2) using only the first term. We may easily prove then that as n x 
and rh approach infinity in any manner whatever the function /(z) represents 
a normal frequency distribution with 


z 



and 


1 /«2 + 1 fti 4" l"\ 
M2:* = ^ ( — j— + —— I . 
* \ n 2 ni / 


This further shows the identity of /(z) and y{z) in the limit as rii and 712 
Since the moment generating function of /(z) is 


we have 
(15 6) /(*) 


o-sr~( 


1 + ~) 

nj 


Km— 1 +®) 


— !" e-*‘(\ - (l + ^Y (ni ~ i+,s) 

2ir J-x \ rw) \ rii, 


d6 


00. 


I have not been able to evaluate (15 6 ) ,We instead shall find an upper bound 
to the difference | /(z) — yiz) | as n 1 and 7 vt become large. We form/(z) — y(z). 
Then by use of Stirling’s formula for tz! with the remainder term and by the 
Fourier Integral Theorem, 


(15.7) | /(*) - y(z) | g _ 1 )y{z) W here 0 < ft < 1, 0 < ft < 1, 


and 


(15 8) lim |/(z) — y(z) \ = 0, and for this case/(z) = y(z). 

n 1 ,n 2“*w 

Of course (15.7) furnishes the upper bound of the absolute value between the 
frequency distribution of z and the function determined by the approximate 
seminvariants of z for any values of n\ and n 2 . 

Up to this point we have assumed that there exists a function determined by 
the seminvariants 


X l(i-l) and + 

2 \rh nj 2 ( n \ 


r Tli +_T — 1 
Tli 


r > 1. 


This may readily be proved by using the following theorem [18] (p. 536): The 




442 


LEO A. AllOIAJs 


determined character of the moments problem for an infinite interval is insured if 
2D cr„ 1/! " diverges (c n = [ 'U'(x)) , 

n~l \ •''M / 


16, The Pearson types of approximating curve. In di.scusMng the types of 
the Pearson system which may bis expected to approximate the z tlMnlmlinn 
we shall use the results of H. Cb Carver (1), and the further exposition of 0, C. 

27*4 " “■ Muir a —* f) 

Craig [3]. To find the; Pear.son typo vve* compute 5 - 


shall find it convenient to use the appioximations n, 
«. - 3 + 4 filb- ** t " ! > to 


trij -f- .1 

\f'l (»l “ Tit) 

•VninAni + n 2 ) 


We 

and 


(16.1) 


niTk(ni + Jk) 

5 


(ill + nj)‘ 


+ 3ui 11% + 2/ii — 2/ti + 2tiz 


and consequently 0 < 5 SI The only possibilities arc Types IV, VII, VI, 
or V since the greatest value of a, by (14.1) is 2.3506. Now if n\ ~ n*, we have 
Type VII, since a 3 - 0, 5 > 0. In all other eases we shall have Types IV, V, 
or VI according as « 3 < 45(5 + 2), <x* = 45(5 + 2), n\ > 15(5 + 2). We 
neglect 5 5 . Hence era < 85 implies 

715(111 — 2) + n*(15n? + (irii) -f- 7ij(]5nJ — 8?ii) 

(1G.2) . , , „ 4 rt 

■T a *(tii Hh tnij) — 2ui 1> 0. 


A simple investigation reveals then the following results: 

Type IV for rq , «s t. 2, n 3 ^ nj. 

Type IV for « 3 = 1,1 £ n, g 21; or n* = 1, 1 £ n» £ 21. 

(16.3) Type VI for Mi = 1, n 2 > 22. 

for Tia = 1, ni > 22. 

Type VII for «i =*= n s , 

Clearly the a distribution has features comparable to Type. IV since both have 
infinite range. However, Typo IV is irksome to lit in practice. 


17. The Type III approximating curve, the logarithmic curve, and the 
Gram-Chadier Type A. The criterion for Type III is 5 = 0, a 3 0. We see 
that aa n 3 and nt increase the value of 5 will decrease. Even for small values 
of n x and n 2 Type III will furnish a fair approximation to the. z distribution, 
For example rq = 10, n 2 = 5, 5 = .094. The advantage of the Type III approxi- 



FISHER’S Z DISTRIBUTION 


443 


mation rests on the fact that Salvosa’s tables may be used. From the chart in 
[16] since a] 5 2 3565, we are assured that the approximating Type III curve 
is bell shaped For ni = 1, 2, n s = any value, this approximation is not all 
that could be desired, although even in such cases it does have value. We note 

that Type III has limited range at one extreme ( — —, °° ) while the range of 

\ «3 / 

the z distribution is (— °a). Salvosa’s tables extend as far as a 3 = 1.1, 

and since max aa = 1 5351, we sea m some cases, and these only for ni = 1, 
n 2 large, we shall be obliged to make use of Pearson’s Tables of the Incomplete 
Camma Funchon [14] The logarithmic frequency curve 


f(u) = —= 


V 2tt c(u — a ) 


exp 


f 1 /. u - aYl 

L~ 25 ( los ~) J 


will be useful m approximating the z distribution While it has been discussed 
by many authors we shall follow Pae-Tsi Yuan [23], where a full bibliography 
may be found. In our discussion we use the 5i = , 5a = chart of the 

Pearson system as given by S J Pretorius [16] (p. 147), since the logarithmic 
frequency locus connecting a\ and on is already drawn in The justification of 
this curve for fitting is due to the fact that m the |3i > P? chart of the Pearson 
system as given by S. J. Pretorius [16] (p 147), the logarithmic frequency locus 
lies in the Type VI region between the Type III locus and the Type V locus, 
and consequently closer to the Type IV region than Type III itself does. Hence 
since Type III fits fairly well under certain conditions and Type IV fits well we 
can expect the same for the logarithmic curve Furthermore when a 3 is small 
the logarithmic curve is similar to Type III [23] (p 42), and as a 3 becomes 
larger, a 3 = 1, the difference between the two types is pronounced. Howevei, 
it isqust when a 3 becomes large in the region n\ = 1, n% ^ 22 that we find the 
logarithmic curves give a fine fit, since in such cases the point (a 3 , fii) lies prac¬ 
tically on the logarithmic Locus [16] To fit the curve [23] (pp 37, 48, 49), we 
find the values of the three parameters a,b, c To find c we solve the equation 
w 3 + 3w 2 — (4 + a\ z) = 0 for w using the table [23] (p. 48) given by Pae-Tsi 
Yuan Knowing w we can easily solve for 


(17.1) 


c = (log ia)*, 

(w + 2 )(7, 


r / u) + 

\ a 3 * 


2\ -4 

W 0 - j , 


a = 2 — 


t = 


3 — 2 C 


xc— jc 2 


<*a 2 


V* (e c “ - l) 1 


whore the value of x must be obtained from the table of areas under the normal 
curve, if the s distribution is approximated by use of areas 

Since the Gram-Charlier Type A series generally approximates a Pearson 
Type IV fairly well when al is not too large, it is to be expected that the Type A 
series will approximate the z distribution in those cases when n\ — n 3 , and also 
when a\ is not too large, 



444 


MO A- AROIAN 


18, Levels of significance and approximation methods. We shall apply the 
results of the previous paragraphs to the determination of the value of z for 

any level of significance «, i.e. the value of z such that / y(z) tk — 1 — a. 

We have such levels as the median (the 50% point of significance), the 20%, 
5%, 1%, and .1% points as given in [9]. Where these tables apply there is no 
need for other methods. It would be desirable to extend the results for any 
level of significance whatever. The methods which we shall use are (i) the 
logarithmic frequency curve, (2) the Gram-Gharlier Type A, and (3) the, Type III 
approximation. For finding the levels of significance by the Incomplete Beta 
function, the reader is referred to [13], (p. Iviii, topic (via)). The logarithmic 
curve is very simple to use in conjunction with the table of areas under the 
normal curve. From Pae-Tsi Yuan we have 

(18.1) £ = —— L t , where (c' 1 - 1)* 

takes the same sign as ««. The value of x is obtained from the table of the 
normal curve, 1.64 for the 5% level, 2.33 for the 1% level; the. value of c is 
obtained from v> (17.1), and consequently the value, of £ (18.1). Then we have 

Z ~ 2 

if z a = value of z for any level of significance, t « -to solve for z„ , where 2, 

and cr„ are the values of the mean and standard deviation of 2 as given by the 
proper formulas in (5), (6), (7), We illustrate with examples: 

(18.2) 5% point of z, ni *» 00, n s *= 1. aj » 1.5351, ui « 1.226*1, x 1.64, 
t = 1.88, 2 = .6352, 0 , — 1.11, and as a result zt% ** 2.72. Fisher [9] gives 2,7693. 

We can also find z*% easily for m = 1, n? = «, Here ar» *» —1,5351, w ** 1,2264, 
x = —1.64, £ = 1.197, 2 = —.6362, a, = 1.11, =» .694 compared with 

Fisher [9] z»% = .6729. 

(18.3) 1% point for n\ = 4, n* = 8, 2 = —.0701, cr, = .4819, a J; , —.3619, 
w = 1.0144, £ — 2.17 and Z\% — .976, while the accurate result is ,9734. 

From experience the values of z for any level of significance obtained by the loga¬ 
rithmic frequency curve will possess an error less than 2% of the true value of z 
for the level of significance if tu and are greater than twenty. It would 
seem that for other values of n, and fh the error could not be greater than 10%, 
and usually would be much less. 


1$. The Gram-Charlier Type A. We take the series in the form 

F(t) = <p(t) + Aav ,<tl (0 + •A<<p lv (£), <p(() * ^ 

V 2ir 


£ = 


z-l 
' ~~ ) 

Cg 


~ Aa;( 
3! ’ 


At 


4! 


Some examples follow. 



fisher’s a DISTRIBUTION 


445 


(19.1) We use the material of (18.3) and employ three terms of F(t). z = 
-.0701, <r, = .4819, X 3l = -.0405, X 4> , = 0336, A 3 = .06032, A 4 = .02596. 

Fitting F(t) by ordinates we have t = 2.17, and consequently z = .976. 

(19.2) We take 7ti = m = 5, z = 0, cr, = .4952, X 3 .„ = 0, X 4 . t = 02798, A 3 = 0, 
Ai = ,01939. 

5% point: By ordinates < = 1.57, z t % = .777, while Fisher gives .8097. 

1% point: By ordinates t = 2.325, z x % = 1.15, while Fisher gives 1.1974. 

(19.3) We take n x = 3, « 2 = 20, z = - 15909, a, = 5099, X 3r *= -.10222, 
X 4 ;« = 08822, A 3 = .12854, A 4 = .05438 By ordinates t = 1.523, z 3 % = -618, 
Fisher gives .5654. t — 1.989, zi% = .855, Fisher gives .7985 The Gram- 
Charlier Type A is recommended only for iii = n 2 and ni, n 2 A 20 


20. Type III approximation, the median, and 5% point. Since for Type III 
the median, m 1 , is approximately two-thirds of the distance from the mode 
to the median if a 3 is moderate [12], [6], then we have further assuming n x , 
n 2 Si 20 


( 20 . 1 ) 


m, = 


3 \«i n J 9 \nl n\) 


From experience this result will furnish an accuracy with an error less than 2% 
of the true value in the range above indicated. 

(20.2) t h% = 1.6437 + ,2760a 3 - .04506a 3 . 

This was found by use of Salvosa's tables and for a 3 > 1 1 by [14]. 

(20.3) z 6 % = cr,[1.644 + 2760a 3 . t - ,0451a 3 J + z. 


We illustrate the use of (20.3) with some examples. 


(20 4) 7ii — ni = 1, = 1.5706, a 3 ,« = 0, z — 0, 35 % = 2 582, 


while the accurate value is z 3 % = 2.5421. 

(20 5) = «j , n 2 = 1, a 3 = 1.5351, z = .6352, <r £ = 1 11, z 3 % = 2 81. The 

accurate value is 2.7693, 

(20.6) m - n 2 - 5, a, = .4952, a 3 .„ = 0, z = 0, z 6 % = .8141, while the 
accurate value is Zt.% — .8097. 

(20.7) m = 4, n 2 = 8, z - -.0701, <r, = .4819, «, = - 3619, z 6 % = .6712, 
while the accurate value is .6725. 

(20.8) fit = 1, n 2 = 10, z = -.5835, v, = 1.1353, a 3 = -1.4333, z l% = .7283, 
while the accurate value is .8012. 

In a future paper exactly the same methods will be used for any per cent point 
of z whatever in order to compare with the results of W. G. Cochran [2] If 



446 


LKO A. AROIAN 


m and th are large we may uw the approximate formulas for <r,, a S;j , and l 
to obtain to the order of cr®, 

(20.9) 1.044,.+ .7700 (t-L). wh, w ». ~ ,/£(£-±). 

We expand Fisher's result [9] 

z s % = * + .7843 — --) hv the binomial theorem, where A ** “j, to 

yn — 1 vh n J <r, 

obtain a comparable result 

(20.10) - 1.645a. + .7843 ( l - - 1 V 

\nt fit/ 

The numerical examples given in this chapter ilhialrale unfavorable rases as 
well as favorable onca. 


21. The distribution of F. Historically ttnedcrnr {10] was the first to use F 
for e u . We find 


(21.1) P(F) 




(n i tiA (ntF 

\2'2) 


p\<H 1 
+ »j) 


§ t«i 1i 


riF, 


0 jS F < 


The distribution of F is ./ shaped if « t g 2, and bell shaped for »i > 2, and for 


«i > 2 one mode exists, Fo 


ntini - 2) 


The two points of inflection, which 


ni(ni + 2) 

exist for n t 2 4, are. equidistant from the mode. The moments are 


i 


P 


<*3: p 



n- > 2m 



2y/2( 2 ni 4- n i) 
Vniihini + n{j 


_ 2nl(rii + fii — 2) « / I , 1 \ 

|Us n t (rk — 2} 2 (n2 — 4) yu n t ) ’ 


The exact results for m , m , a %, and «» are omitted because of length. We 
have the theorem that as ?n , n s « in any manner whatever the distribution 


of F approaches normality with mean F = 1 , cr? — 



The proof 


is omitted. The only type of approximating curve of any value is Type III. 
Of course the distribution of F is Type VI. No tables exist for Type VI. 
Furthermore the F distribution approaches the Type III function so slowly as 
to make most approximations of little value unless a 3 .p gf 1.1. Other possible 



fisher’s z distribution 


447 


parameters are * - F, and H - , [131 Since | «.„ | = 

2 | <**•« | approximately we see that the distribution of II is more skewed than 

that of z. We mention briefly also S\ - S\ where S{ = — si, S? = — s? . 

Ah A /2 

Clearly z, F, and 7/ give equivalent levels of significance This is not true 
for z and Si — Si. 

i 

Finally, since F = -2 , it may be interpreted as a quotient [5], When the 

S l 

moments of F do not exist, it is due to the distribution function of si 

22. Conclusion. We have found the seminvariants for the z distribution, and 
approximations for them Type III, and the logarithmic normal frequency 
functions are shown to be excellent approximations to the z distribution The 
approach to normality for the z distribution is proved A formula is given for 
finding the 5% level of significance for z. The F distribution is studied along 
the same lines As far as the construction of tables for levels of significance is 
concerned, the z distribution is much easier to use My smcerest thanks are 
due Piofessor C C Craig for his helpful guidance and many suggestions. 

BIBLIOGRAPHY 

[1] H C Carver, Handbook of Mathematical Statistics, II L Rietz, ed , Boston 

IIoughton-Mifffin Co , 1924 Chaptei on frequency curves 

[2] W. G Cochran, “Note on an approximate formula for the significance levels of z,” 

Annals of Math Stat , Vol 11 (1940), pp 93-93 

[3] C. C. Craig, “A new exposition and chart for the Pearson system of frequency curves,” 

Annals of Math. Stat., Yol. 7 (1936), pp 16-28 

[4] C. C. Craig, "An application of Thiele's semi-invariants to the sampling problem,” 

Melron, Vol 7 (1928-29), pp. 3-74 

[5] C C. Craig, “The frequency function of yfx," Annals of Math , Second Seiies, Vol. 

30 (1929), pp 471-486 

[6] A T Doodson, “Relation of a mode, median, and mean in a frequency curve,” Biomet- 

nka, Vol. 11 (1917), p 425 

[7] 11. A Fisher, “On a distribution yielding the eiroT functions of several well known 

statistics,” Proc International Math. Cong , 1924, Toronto, Vol 2, pp, 805-813 

[8] R A Fisher, and E A Cornish, “Moments and cumulants in the specification of 

distributions,” Revue de I’lnsliiut International de Statistics, 5tli year, pp, 307-20, 
1937, La Hague 

[91 R A Fisher, and Yates, Statistical Tables for Biological, Agricultural, and Medical 
Research, London 1 Oliver and Boyd, 1938, 

[10] ft, Knoi’P, Theory and Application of Infinite Series, English translation, Edinburgh: 

Blackie and Son, 1928 

[11] N Nielsen, Handbuch der Theorie dcr Gamma Functionan, Leipzig Teubner, 1906 

[12] C. A. Olshen, “Transformation of the Peaison Type III distribution,” Annals of 

Math Stat., Vol. 9 (1938), pp. 176-200. 

[13] IC. Pearson (Editor), Tables of the Incomplete Bela Function, London. Biometrika 

Office, University College, London, 1934 

[14] K. Pearson (Editor), Tables of the Incomplete Gamma Function, London. His Majesty's 

Stationery Office, 1922, 



448 


LEO A. AKOIAN 


[16] K. Pearson, S. A Stoueeer, and F. >J. David, "Further applications in stiUiHtica of 
the T„(x) Beaaol function,” Ihomclrika, Vol. 2*1 (1 ( J32), pp. 293 350. 

[16] 8. J. Pretoriuh, “Skew bivariate frequency curves ess mined in the light of numerical 

illustrations,” Biomelrika, Vol. 22, (1930-31). 

[17] L. R. Salvos a, "Tables of Pearson's Type III function," Annals of Math, fllal., Vol. 

1 (1930), pp 191- 8 et seep 

[18] J. Shohat, arid M Frechet, "A proof of the generalized aerond limit-theorem m the 

theory of probability,” Tram. Am. Math Sac., Vol. 33 (1931), pp. 531 43. 

[19] G. W Snedecor, Calculation and Interpretation of the Analyst .r of Variance and Co- 

variance, Ames, Iowa. Collegiate Press. 

. t(3 

[20] T, J Stiki-tjes, "Tables dea valours des sommess* *» ^ n Ada Math., Vol 10, 

n«*l 

pp 299 302. 

[21] Whittaker and Robinson, The Calculus of Observations, Edinburgh: Blackio and Son, 

second edition, p. 135 

[22] Whittaker and Watson, Modern Analysis, 4th edition, London: Cambridge Uni¬ 

versity Press, 1935. 

[23] Pae-Thi Yuan, "On the, logarithmic frequency distribution and the semi-logarithmic 

correlation surface,” Annals of Malh. Slat., Vol. 4, (1933). 

[2-1] R. T, Zoch, "Some interesting features of frequency curves,” A nnaU of Malh Mat,, 
Vol. 4, (1935), pp. 1-10. 



THE DOOLITTLE TECHNIQUE 

By Paul S. Dwyer 
University of Michigan 

1. Introduction. Most authors who have presented the Doolittle method, 
from Doolittle [1] down to the present, have not given any formal proof that the 
solution is valid in the general case They usually are content with a form 
describing the various steps of a Doolittle solution. 

The author has recently shown [2] that the Doolittle method can be abbrevi¬ 
ated to a technique which is also an abbreviation, essentially, of the method of 
single division and its abbreviation which Aitken called the “Method of Pivotal 
Condensation" [3] It appeals at once that the validity of the Doolittle method 
follows from the validity of the method of single division—a validity which is 
readily established 

However one may desire a “proof” which is based directly on the Doolittle 
technique without referring to other methods of solution. It is the chief 
purpose of this paper to present such a proof It is accomplished by the intro¬ 
duction of a notation which precisely describes the conventional Doolittle 
process and by proving that this process results in a system of equations whose 
prediagonal terms are zero. It is a secondary purpose of the paper to emphasize 
the advantages of the Abbreviated Doolittle method and to explain and illus¬ 
trate minor variations in the conventional Doolittle technique. 

2. The Abbreviated Doolittle solution. We first direct our attention to the 
essential parts of a Doolittle solution and these are the last two rows of each 
matrix of the standard Doolittle presentation. The additional rows in the 
standard presentation are rows of products which are used solely for the purpose 
of finding the two bottom rows of each matrix and they need not be recorded, 
if a computing machine is available, since the essential information is present 
in the two bottom rows. Doolittle [1] did not have calculating machines (he 
used multiplication tables) but he put the important information m Table A 
and carefully segregated the supplementary information in Table B. With 
reference to this he wrote [1] 

“It is to be observed that the numbers in Table B have but a single use while 
those in Table A are used over and over, and where the number of equations is 
large, it is of great advantage that they should be thus tabulated by themselves 

in a form compact and easy of reference " 

For purposes of proof, as well as for purposes of calculation if a computing 
machine is available, it is only necessary to utilize the forward part of the 
Abbreviated Doolittle solution which is the equivalent of the Doolittle Table A. 

449 



■150 


PAUI, S. DWYKlt 


A four variable illustration of the Abbreviated Doolittle, technique is presented 
in Table I. The successive equations are indicated by number, as is customary, 
and the operation wlneh deline- the equation is specified. The actual operation 
is indicated mine explicitly by the notation of column 3 and tins is discussed in 
the next section 

The presentation of Table I introduces one variation from the standard Doo¬ 
little method The division is made by the diagonal coefficient of each row 
rather than by its negative. One may still use the old technique, if lie. prefers, 
hut it is felt that one can subtract products as easily as he can add pioduols with 
modern machines equipped with automatic negative multiplication. In addi¬ 
tion the entries of the equivalent, rows then have (lie same signs and, too, it is 
not necessary to take the time to change the signs of the second rows. This 
variation uses the same division method as tiie method of single division [2] 
and as the method of pivotal condensation [3} so that the abbreviated form of 
these methods is, essentially, the same as the abbreviated form of the Doolittle 
method, 

The application of this technique leads at each step to a coefficient for each 
variable. However if the process is to lead from our four equations in four 
unknowns, to three in three, to two in two, to one in one, it follows that all the 
entries to the left of the diagonal, which we may call prediagtmal entries, must 
he zero, That tins is true in the general case is the objective of tin 1 proofs of 
later sections. 

3, A notation for and description of the Doolittle technique. A main contri¬ 
bution of the. present article is the uhg of a notation which describes the Doolittle 
technique. As long as the Doolittle process is described loosely by means of 
"operations” it is difficult to be precise in defining quantities which appear in 
the calculation, but when a notation is used which is definite enough to permit 
expansion in terms of the original coefficients, some sort of proof may be avail¬ 
able. The present notation bears some resemblance to that suggested by 
Gauss [4], thougfi Gauss used letters to indicate the primary subscripts and 
numbers to indicate the number of secondary subscripts and his notation was 
directly applicable to the sums of least squares theory rather than to symmetric 
equations in general. 

We wish to find the solution of the equations 

n 

(1) 12 cq ,xi = &„+!,/, j - 1, 2, * ■ •, n 

where the matrix of the coefficients is symmetric. Wc do this by obtaining 
auxiliary equations which feature a decreasing number of variables. No serious 
restriction is made if wc assume that the variables Xi, x %, xa , etc., are eliminated 
successively. The Doolittle technique may then be described as follows: 

We take the first equation of (1) and divide by its leading coefficient, a u , to get 



TABLE I 

Abbreviated Doolittle technique, forward solution 


THE DOOLITTLE TECHNIQUE 


451 





452 


PAUL S. inVYKR 


( 2 ) Z hi ft = ^u, where h, { ~ 

TZl «n 

and we then form 

n 

( 3 ) D fiinr. - «h),! i with a,2.1 “ «o — «.i hsi. 

1-1 

We then divide hy an-i and get 

ji 

( 4 ) Z = hnH. 2-1 with b l3 1 - ,vi . 

in, l “52 1 

We next form 


(5) Z o,i.jiZ t — a„ + 1 . 1.12 with a, 3 .it = a,a — a.ifoi — a, 1.1 5m. i, 

1-1 

and 

( 6 ) 


Z hij.lji't — hn-J 1,3 15 with fa?.)* — 

,«.i Ojj-ja 


Thi.s process is continued ho that, in general, we, have 

n 

(7) /* t a//,i2 . i-ix. — (in+1,1,12, , 3 ^ f, 2, * * t n 

i ~i 

and 

rt 

(8) Z V 12 ...,-13*. *= 5«+i,/i » 1, 2 , * ■ •, n 

1-1 

with 

(9) = a ‘ J ~ ~ a i« J>A-i ~ ufbinit ~ • 

~ ««',2 13■. /—a-IS< *-2—3 ~ 0(,i-l-lS*--j~a’h, > '-1.12-..,~i 

and 


( 10 ) 


5 , 7 . 12 . 


•/-1 


men- /~j 
°»»-w. 7-1 


It is to bo noted that the n equations ( 1 ) are transformed by this process to 
the n auxiliary equations of (7) or ( 8 ). The. solutions of (1) are also solutions 
of these auxiliary equations since the auxiliary equations are linear eomhimdions 
of ( 1 ). It is our purpose to show that the prediagonal coefficients of these 
auxiliary equations arc always 0 so that these auxiliary equations feature a 
decreasing number of variables. 

We may use the term primary subscripts to indicate, the. first two subscripts 
and the term secondary subscripts to indicate the later subscripts which specify 
the order of elimination of the variables The "order" of the coefficient is then 
equal to the number of secondary subscripts. 



THE DOOLITTLE TECHNIQUE 


453 


The formula (9) gives the matrix of the final Doolittle set of equations. At 
each stage of the reduction one can write down a formula for all the elements 
m the matrix at that stage Thus one can write the coefficients of order h , 
h , in terms of coefficients of order less than h, 

‘h = ~ Onib a ,2 i5,2 i • • 

~ .h-zbi-h-2— dih.n. h-jbjh-n h~i ■ 

It follows at once that 

A-l — Clih .12 h-1 bjh 12. h-1 

12 h-1 Oj!t.l2...A-l 

h-l — - . 

dhh 12 *A —1 


Oij.12 h — Uij.12. 

( 12 ) 

— a»i*i2. 


4. Some theorems on the interchangeability of subscripts. Our mam objec¬ 
tive is to prove that the prediagonal terms are zero. In order to do this we first 
prove some theorems dealing with the primary and secondary subscripts. 

THeobem 1. The value of a.,. r, is not changed if the primary subscripts are 
interchanged This theorem which might be stated "The matrix of the coeffi¬ 
cients of a given order is symmetric" follows from the symmetry of the matrix 
of coefficients of zero order We can show that the symmetry of the matrix 
having coefficient of order h follows at once from the symmetry of the matrix 
having coefficients of order h — 1 by comparing the value a t] . h with that of 
a n . ..h obtained by dual substitution in (12). Since the matrix of zero order 
coefficients is symmetric by hypothesis, it follows that the matrices of the 
coefficients of order 1, 2, 3, 4, etc., are in turn symmetric. 

Theokem 2: Any pair of consecutive secondary subscripts may be interchanged 
without changing the value of the coefficient This theorem indicates that, within 
prescribed limits, the order of elimination does not have any effect on the result 
Consider the coefficient a,,... «, having r secondary subscripts before the 
k and s secondary subscripts after the l and consider the corresponding coeffi¬ 
cient a,,. . ik . which results from an interchange of k and l These coefficients 
can be expressed by continued use of (12) in terms of coefficients of order r + 2. 
The resulting expansion of o,,....... is equivalent to that of a,,, n . with the 

interchange of the l and the k. It follows that the theorem is true if a t y . n ~ 
a, 1 /.. ,*i , Now a double application of (12) to a,,., ik leads to the expansion in 
terms of coefficients of order r (using the notation o,,. to indicate the coefficient 
of the r-tli order) 


(13) 


fli). .kl — 


Uilt. U)k. 

akk. 


( aik. aik.\/ ajk.aik,\ 

a,i. 11 a } i. I 

akk ■ / \_ akk. / 


2 

a ik. 
an. — — 
akk. 


Then a,,... a is expanded similarly, the difference is formed and found to be zero. 
It follows that the theorem is true 



454 


FA.UD 8. DWYER 


The application of Theorem 2 with the continued interchange of successive 
secondary subscripts in all possible ways leads at once to 

Theorem 3 : The secondary subscripts may be interchanged in all possible ways 
without changing the value of the coefficient. This theorem might lie stated "The 
value of the resulting coefficient is independent of the order of elimination.” 
This is the sort of result one would expect to find and indeed, some may feel that 
it is intuitively evident, hut this formal proof is presented for those who desire 
a more rigorous approach. 

Theorem 3 enables us to prove Theorem 4 which may be stated: The value, of 
Oij.n...n m always zero if at hast one of the secondary subscripts is equal to one. of 
the primary subscripts. 

Suppose i is this subscript. Then by Theorem 3, i may be placed in. the final 
position. Now by (12) we have, 

• i-s 

Z=L CL{ j»* . ' "" —- 0. 

A similar statement holds if j appears among the secondary subscripts. 

6 . The vanishing of the prediagonal entries. Aa an application of Theorem 4 
we can show that the prediagonal entries arc identically zero and this is exact]} 1, 
what is needed to establish the validity of the forward Doolittle process. It is 
to be noted that the prediagonal entries are of form a tj . u .,..i with i < j. Then 
i must equal one of the secondary subscripts and the term iH zero. 

It follows that no entries need be made to the left of the diagonal in the 
Abbreviated Doolittle solution and, indeed, no entries need be made in the 
original matrix below the main diagonal. A numerical problem is presented in 
the next section. 

8 . Illustration. The Abbreviated Doolittle technique is illustrated in Table 
II. This illustration is essentially an illustration of a previous article [2] and 
serves as the basis, in a later section, for expansion into the standard Doolittle 
solution, The check is shown in the right hand column and the back solution 
is indicated. The check entries for the first matrix are obtained by adding the 
entries in the row to the main diagonal and then adding the entries in the 
column. All other check entries are obtained by adding the entries in the row. 

The solution is easily made once it is understood and results from continued 
application of formula (9). Tor example 

Am m3 =» an — Oiifqi — aa.ibii. i ™ flw.iafhii.ii 

and this is 

Oti.us = .8000 - (.2000)(.6000) - (.3200) (.1905) - (.4619) (— .10J 2) = .6935 

(see the underscored entries of Table II). Terms of this sort are easily com¬ 
puted if a calculating machine, and especially so if one equipped with automatic 



THE DOOLITTLE TECHNIQUE 


455 


positive and negative multiplication, is available The back solution too is 
easily accomplished with a machine It is only necessary to substitute in turn 

in each of the “b ” equations. Thus the value of x 1 is -- = b u m f the value 

123 

of is 653,42 643 12^54.123 = 663.124 , that of Xg is b 52 1 b 42 1654.123 — 632 1&53 .121 = 

?>62.i34, etc. The back solution of the check is treated similarly. 

7. A variation in technique. Before proceeding with the presentation of a 
standard Doolittle solution it seems wise to indicate another possible variation 
in the technique in addition to the division by the diagonal coefficient rather 
than its negative. It is possible to obtain the Doolittle solution by using the 
fixed entry from the first of the equivalent rows in place of using the fixed “ 6 ” 
entry and the variable “a". This results from the fact that 

(14) 6 ,/,.. = . .btic.. 

Thus in Table II the value 054.123 can be obtained with the use of 

064 123 = 064 — 041651 — O42 1652 1 — 043.12653 12 

as readily as with the use of 

064-123 = 064 — O&1&41 — 052 1642 1 ~ «53 12643.12 

See the boxed entries of Table II. 

There seems to be no real choice between these techniques. The fixed “b" 
is traditional in the standard Doolittle solution while the abbreviation of the 
method of single division leads to a fixed “a”. The point to be emphasized here 
is that either the fixed “a” or the fixed "6” can be used. Also (14) is used in 
the next section in supplying details for the check portion of a standard Doo¬ 
little method, 

8. The standard Doolittle method. If no computing machine is available 
or if a more detailed solution is desired, it is preferable to record the individual 
products of (9) and thus arrive at the standard Doolittle method. (The division 
by the diagonal coefficient rather than its negative is not a fundamental differ¬ 
ence.) The standard Doolittle method, from this point of view, is an expanded 
form of the Abbreviated Doolittle method with more details added. Its validity 
then follows from the validity of the Abbreviated Doolittle method While it 
is not true that all prediagonal terms vanish in the standard Doolittle method, 
and this fact complicates the check by row sums, yet the prediagonal o,„ . 
(and 6 U ,.. ) are all zero. 

The standard Doolittle method is presented in Table III, Some remarks 
should be made about the non-recorded terms, the two check solutions, and the 
back solution. 

The blanks (—) indicate non zero entries which are usually not presented in a 




45G 


PA.UI, S, DWYER 


Doolittle solution. They should he eonsidered however if the first rheok method 
is to he used. 

The first cheek method, which is the logical extension of the check method of 
the: Abbreviated Doolittle solution, has been outlined by Kzekiul ffl). The row 
sum is the sum of all the entries in thr row whether recorded or not. In order 
to cheek, it is necessary to add these unrecorded entries, and they are available 


TABLE II 

Abbreviated Doolittle Solution; illuMralum 


*1 

Xi 

I 

1 I 

i 

** i 

1 

i 

i 

C'hpck 

1.0000 

.4000 

. 5000 * 

. 0000 1 

.2000 j 

2.7000 

— 

1.0000 j 

.3000 

.4000 1 

.4000 

2.5000 

— 

1 

1.0000 

.2000 j 

. 0CKK) 

2.0000 

— 

! 

1 

1 

1 . 0000 J 

,8000 - 

8.000(1 

1.0000 

.40000 I 

.5000 

j.mx>) ; 

.2000 

2.7000 

1.0000 

.40000 

.5000 j 

, j 

. (1000 j 

|.2000j 

2.7000 


.8400 

.1000 : 

] 

j.'lfioo) 

.3200 

1,4200 


1.0000 

.1190 ! 

* 

.1905 

Lwiol 

, 1 

1 . 0905 



.7381 

(~. 1190 1 | 

.4019 i 

1.0810 



1.0000 

-.1012 | 

f. 0258| | 

1 . 4040 



_ 

.5903 

41935 : 

1.2837 




1,0000 

1 . 1748 

2.1747 



1.0000 


.8152 

1.8152 


1.0000 



,0002 ; 

1.0002 

1.0000 




-.9300 ! 

1 

! .0035 

1 


in the columns above if wc make use of formula (12). Thus, if we wish to check 
s 

the value 2 a.ifoi = 1 .( 1200 , we have 
» —1 

<JnOu •+■ Ojifni + asil>o J r (Wni 4* Osdni =■ 

an + 4- au&si + -T 051641 ~ 

.6000 + ,2-100 + .3000 + .3000 + .1200 = 1.G200. 

Another check method, which is recommended by Peters and Van Voorhis [ 6 ] 
sums the entries in the row only over those columns which are to be recorded. 



THE DOOLITTLE TECHNIQUE 


457 


This is presented as check method 2 of Table III. As is to be expected, the check 
values of the a’s and b’s of the last two rows of each matrix are in agreement. 

It might be noted that one may use the first check method without checking 
the intermediate steps (the sums for each row) if he checks the s ums for the last 
two rows of each matrix. 


TABLE III 


Doolittle solution, with checks 


Notation 

XI 

X2 

X3 

24 


Chock 
Method l 

Check 
Method 2 


l 

SSSS 



6000 

■ 

1 

IKfl 


(ly.’i 

— 

9 


.4000 

1 

iipt™ 

1 



— 

— 


.2000 





flit 

— 

— 

— 

1 0000 

3S12I 


1 8000 


a,! 




.6000 

iips 


1 


bn 

: 

J 


.6000 


mSm 

■ 


a. j 

—. 


s 

— 

4000 

,2.5000 



QU1&21 

— 


u 


0800 




0.5-1 





3200 

HSl 



&,2» 1 




.1905 

.3810 

H 

1 6905 


a.j 

— 


1 0000 

m 

6000 

2.6000 

1 8000 


o.ibji 

— 


2500 


1000 

1 3500 

6500 


a.a-ibjj i 


— 

.0119 


0381 

1690 

0690 


3' 12 



.7381 


.4619 

1 0810 

1 0810 


6.3-13 




- .1612 

6258 

1 4646 

1,4646 


an 

_ 

.— 

■- 

HS9 

1 


mm 


a.ibn *> 

— 

— 

— 

3600 



ICSI 


a ,2 


— 

— 

0305 



mm 


3»1 12^43 12 



— 

0192 


- 1743 

SH 


fl«4* IZ8 




5903 

6935 

1 2838 

1 2839 


&i4 123 




1.0000 

1,1748 

2 1748 



t.3 111 



IlMM-git 

- 1894 

8152 

1 81532 

- 3506 


b. j- m 


1 0000 

| 0970 

2238 

0602 

1.0602 

4143 

HEW-* 

Oil-Ill 

BH 

HU 

.7049 

- 9366 

,0634 

1 3049 


4241 


The back solution is carried out as in Table II. If no computing machine is 
available or if the detailed steps arc desired they may be indicated as in Table 
III. The entries in the box under the x\ column are respectively b H .123643 12 , 
btA.mbu.i, and bu-mbu ■ Those in the preceding column are b M mb 321 and 
b t 3 1246 a t, The other entry is b i2 131 b 2 i . The values of the coefficients are ob¬ 
tained by subtracting these row entries from the constant term of the corre¬ 
sponding “ 6 ” equation. Thus, bis 124 = (.6258) — ( — .1894); 652 .m = 



























458 


PAW, S. DWYEIt 


(.8810) - .0970 - .2288, etc. The hack solution of check method 1 agrees 
with that of check method 2. A form for accomplishing the hack solution of 
the check is indicated at the right. It is not necessary to complete the hack 
solution of the check if it is not desired, and indeed, there are some who feel 
that the use of the row sum check is unnecessary with modern computing ma¬ 
chines [7], The basic check is substitution in tilt 1 original equations. 

9, Summary. The chief purpose of this paper is,to show that, the Doolittle 
technique actually leads to a set of equations featuring a decreasing number of 
unknowns. This is accomplished by the introduction of an appropriate notation 
to describe the process and the establishment of certain theorems which .serve 
to validate the process These theorems are of some interest aside from the 
application made here. It is a secondary purpose of this paper to emphasize 
the practicability and theoretical advantages (relative ease of calculating, theo¬ 
retically more accurate, less chance for numerical error, loss recording, less time, 
consuming, more compact, and more easily checked) of the Abbreviated Doo¬ 
little method and to explain and illustrate, possible variations in technique in the 
forward and check (by row sums) portions of the standard Doolittle solution. 
It should he noted that the notation suggested is very useful in providing an 
easy development of various theorems used in multiple and partial correlation 
studies, the presentation of which is not the. purjxme of the present, paper. 

REFKItKNC'KS 

[1) M, H. Doouttle, “Method employed in the solution of nomad ctjimtinns mid the Ad¬ 

justment of n trianRulution," V. S. Coast and Geodetic Surer’/ Report (1878), 

pp. 116-120. 

[2] P 8 . Dwyer, "The solution of simultaneous equations," Psychametrika, Vol. II (10-11), 

pp. 101-120 

[3] A. C. Aitkkn, "Studies in practical mathematics I. The evaluation, with applications, 

of a certain triple product matrix," Hoy, Soc Kdin. Pm., Vol. 57 (1037), pp. 
172-181. 

[4) C. F, Gauss, "Supplementum theoriac combinations observntionum erroribua minimis 

obnoxiae,” fFcrAe, Vol, 4 (1873), pp. 09-71. 

[6] Mobdecai Ezekial, Methods of Correlation Analysis , John Wiley and Sons, Inc., 
New York (1030), pp. 302-364. 

[6] C. C. Peters, and W. It. Van Voomns, Statistical Procedures ami Their Mathematical 

Bases, McGrnw Ilill (1040), pp. 228-229. 

[7] A, K. Kurtz, "The use of the Doolittle method in obtaining related multiple correla¬ 

tion coefficients," Psychomeltika, Vol, 1 (1930), pp. 46 61. 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


A PROBLEM IN ESTIMATION 

By Joseph F Daly 
The Catholic University of America 

Several recent psychological studies in the field of memory testing [1], [2], [3] 
have suggested the following problem Let each individual E in our popula¬ 
tion be characterized by the variates y l , ■ • ,y p ; y p+l , ■ ■ ■ , y p+t (p > t) Sup¬ 
pose, however, that circumstances make it impossible for us to observe the last 
t variates. For example, we may thmk of y l , ■ • • , y v as an individual’s scores 
on a battery of tests, and think of y p+1 , • • ■ , y p+ ‘ as measures of certain psycho¬ 
logical characteristics which, though affecting the individual’s performance, are 
not subject to direct observation. To make up for this, assume that we have 
a theory which tells us that if y v+ \ ■ • - , y p+t are held constant, then the ob¬ 
servable y'a are dependent upon them according to a specified regression equation 

V' = x'y, (i = 1, • • • , p) m = V + 1, • • • , V + *)• 

Somewhat more precisely, we assume the distribution laws 

(1) f(v\ ■ • * , V P+l ) = (2 tt)- ){ i,+() | A„ | s exp {-Urfif ~ J){y‘ - a*)}, 

(where r, $ = 1, • • • , p -f t, and repeated indices are to be summed according 
to the usual convention) and 

(2) /(!/', expiE(t,‘-/)*}. 

The x J are supposed to be known, but except for the conditions imposed by (1) 
and (2) nothing is known about the quantities A r , ,a T , and u. Having observed 
the teat scores y{ (a = 1, • ■ ■ , N) obtained by N individuals E a drawn at 
random from the population, we wish to estimate the values yl +1 , * • •, yl +t 
corresponding to each E a > and the essential parameters m the distribution law 
(1), particularly the variances and covariances of y v+l , ■ ■ • , y v+t . 

We can easily find optimum estimates of the by applying the method of 
maximum likelihood to the function (2) after substituting for the y' the scores 
y' a obtained by the individual in question. Thus if we write 

459 



400 


JOHKPH V. liAVY 


(assuming thereby that the rank of Hie matrix |; -r^ i■ is /) we have 
(3) - f’flU ■ 

These estimates arc unbiased in the sense that the expected value of f calculated 
from the distribution law (‘2) is f. 

But when we come tu estimate the variances and covariances involved in (l), 
the, procedure is less .straightforward, Thaler the present circuinsluticca we 
cannot use the expression 


(4) N j T, or. - rnf - /), 

for the sample covariance of y u and >/. We might, of eour.se, try substituting 
the estimates if a from (3) for the unknown f a in (4). But this expedient will 
in general produce a biased estimate. Denoting tile, required covariance by 
A** (the element in the appropriate position in the inverse of the matrix (j /l r « |j), 
wo find as a matter of fact that the expected value of (1) when the f a are re¬ 
placed by their estimates f a is 

(5) A* + <r s r Mr . 


This bias may or may not be important in any given ease. But it can conceiv¬ 
ably be quite serious if the A"" are relatively small, especially if such expressions 
are employed in the usual way to estimate the correlation coefficient rather than 
the covariance. 

Perhaps the most logical way to attack the problem is through the joint 
distribution of y\ ■ • ■ , y" alone, obtainable by integrating the undesirable, 
variates y p+i , • • • , y 1 '* 1 out of (1). We therefore consider 

(6) f(y\ ■ ■ ■ , V v ) = (2"-)“*” 1 1„ I 1 exp 1 -\1<,(y' - a , )(f - a J )J, 
where 

- a,„b"x >( || ir || - \\a„\\-\ 


Moreover, when account is taken of (2), we find that we must have 


A 




f » XW 


(io being Kronecker’s delta). If we now form the likelihood function 

N 

II f(y\ , ' ■ • , vl) from (6) for our sample, and set its derivatives with respect 

a«»l 

to the a", o- 1 , and the equal to zero, we arrive, after some simplification, at 
the equations 



CONFIDENCE LIMITB 


461 


(7) 


Jf 23 (V« - ®i«")(?/« - 5. 3 = 0, 

{a° - i Z (»«- - *;<o} *:*; = o, 

a ,j = o-v + x;^^ 


for determining the maximum likelihood estimates. The first of equations (7) 
is already solved for the a'', and the solution of the simultaneous equations for 
the remaining essential parameters yields the estimates 


(9) A'- - i E <y". - tf\W, - <0 - ."9 ! . 

A considerable amount of algebraic manipulation is required to put the solu¬ 
tions in the form given above, but since the results are about what one would 
expect in view of (5), we omit the details. As is often the case, some bias re¬ 
mains in the “optimum” estimates (9). However, this can be eliminated by 
writing N — 1 m place of N The estimate (8) of a 2 is unbiased as it stands. 


REFERENCES 

[11 T, V. Moore, “The analysis of association by its equational constants,” Aspects of the 
New Scholastic Philosophy, New York, Benziger Brothers, 1932, pp 181-225 

[2] IIeoih Holland, “Tho development of logical and rote memory,” Studies m Psychology 

and Psychiatry, the Catholic University of America, Vol 4, no 8 (1940). 

[3] Helen dj 4 Svles Forrest, “Correlations between the constants in the curve of learn¬ 

ing,” Studies in Psychology and Psychiatry, the Catholic University of America, 
Vol. 5, no. 1 (1941) 


CONFIDENCE LIMITS FOR AN UNKNOWN DISTRIBUTION FUNCTION 

By A. Kolmogoroff 
Moscow, U.S.S.R. 

Let Zi, xt, ■ • * , x n be mutually independent random variables following the 
same distribution law 

(1) Ffxi < £} = F(i)- 

A recent paper by A. Wald and J. Wolfowitz deals with the problem of using 

1 A. Wald and J. Wolfowitz, “Confidence limits for distribution functions," Annals of 
Math. Stat , Vol 10 (1939), pp. 105-118 



462 


A., KOLMOaoHOFT 


the observable values of the x's to estimate the function F(|). In this connec¬ 
tion it may be useful to recall the following rewrite published by me in lD33. a 
Put 

( 2 ) F n {& « ® 

n 

where JV({) denotes the number of those x'h whose. observed values do not 
exceed f. 

Theorem 1: If the function /<’(£) it) continuous then the distribution law of the 
quantities 

(3) Dn = sup I FiO - F„<f) | Vn 
does not depend on F(£). 

Denote by 4>„(X) the value of the probability PIP* < X) which is common 
to all continuous distribution functions F(£). 

Theorem 2: For n tending to infinity , the distribution function <I>*(\) tends to 

(4) f(\) - E (~l)*e **’ x ’ 
uniformly with respect to X. 

A more elementary proof of Theorem 2 was given by N, Smirnoff in 1939. 3 
Another paper by the same author* gives a table of the function 
Without the assumption that F(Z) is continuous, we easily obtain 
Theorem 3: Whatever be the distribution function F(£), 

(5) P\D n < X] > <f> B (X). 

Theorems 1 and 3 giving the exact lower bound of the probability that F n (f) 
will satisfy the inequality 

(6) ! F(f) -.Fn({) I < 4- 

Vn 

for all values of f, can be used to establish confidence limits for F(£) corre¬ 
sponding to the confidence coefficient 

(7) a - 4>„(X). 

These confidence limits will be free from any restriction concerning the nature 
of the function F(f), 


* A. KolmogorofT, "Sulla detcrniinationoempiricadi una leggo di distributione," (Hamate 
dell’Fstiluto Ilahano degli Altvari, Vol. 4 (1033), pp, 83-01. 

3 N Smirnoff, "Sur Iob hearts de la courbo de distribution ompiriciue," Recueil Math, de 
Moscou, Vol. 0 (1030), pp. 3-20, 

3 N, Smirnoff, "On the estimation of the discrepancy between empirical curves of distri¬ 
bution for two independent samples, 11 Bulletin de V Uniuersid de Moscou, Stine internationals 
(Mathtmatiques), Vol. 2, fasc, 2 (1930). 



CONFIDENCE LIMITS 


463 


For sufficiently large values of n we can use the limiting distribution (4) and 
write 

(8) a = <i>(X). 

The following short table, based on that of Smirnoff/ gives the values of X 
corresponding to a few chosen confidence coefficients a. 

TABLE OF X 


a 

X 

.95 

1.35 

.98 

1.52 

.99 

1.63 

.995 

1.73 

.998 

1.86 

.999 

1.95 


Smirnoff’s paper 4 contains still another application of the function <t(X) 
Denote by x[ , x' 2 , • • • , and x", x”, - ■ , x„ 2 two sequences of mutually inde¬ 
pendent random variables following the same probability law F(£). Let further 
F ni (£) and F nj (|) be two random step functions corresponding to these series, 
defined as in (2). Smirnoff proves then the following 

Theorem 4: If the probability law F(|) is continuous, then the probability. 

(9) P {sup | F ni (f) - F nj (|) | < X = *»i. n,(X) 

is independent of the function F(£). If n\ and ni are indefinitely increased subject 
to the restriction that the ratio ni/n? remains between two fixed numbers a x and a 2 


(10) 

0 < Oi < — < o 2 < + 00 

nt 

then 


(11) 



In the general case, where the probability law F(£) is absolutely arbitrary we have 

(12) p{bu P | F..(t) - F na (i) | < X 
Owing to the above results the quantity 

(13) £>»„», - sup | r„,(D - f„({) | y 7 

could be used as a criterion to test the hypothesis that the probability laws of 
the two series of observable variables are actually the same. 







m 


M. «. KENDAt.T, 


CORRECTIONS TO A PAPER ON THE UNIQUENESS PROBLEM 

OF MOMENTS 

By M. CL Kkndau, 


London, England 


I wish to make certain corrections in my paimr nn “f*ittittifir*n*» for Unique¬ 
ness in the Problem of Moments" („•! twain of .Math. Slat , V«il. 11 tllUOl, p. >102). 
I thought I had succeeded in improving on remits given earlier by Slieltjes, 
L6vy and Carl cm an, but this is not so. 

Theorem 1 of the paper stated that a set of moments determines a distri- 


oc ) t H 

bution uniquely if 52 2 converges for some real non-zero t , v, 1 leing t he ! 

r—0 T 


absolute 


moment of order r. This is true, and a similar result has been proved by t/‘vy, 
but my proof contained a small lacuna. It was shown that the clmraeteristic 
function0(0 has a Taylor expansion which, under the conditions of the theorem, 
is convergent; but it has also to he shown that it is equal to the sum of that 
expansion. This may be seen as follows: 

We have 




and hence, on taking mean values, 

j *(0 


ft 1 (W'p, 


z 

r*-f) 


in C 
n\ 


V l n 

Since by hypothesis - 
n\ 


0 , 0(0 must be equal to the stun of its tconvergent) 


Taylor expansion. 

The principal error was a statement that v] i"fn must either tend to a limit or 
diverge. For this reason, the second theorem should run: a distribution deter¬ 
mines a distribution uniquely if lim » l J n /n is finite (not lint el/", n as originally 
stated). Theorem 3 should also be restated with the. upper limit substituted 
for the limit therein. 

Theorem 4 stated that a set of moments uniquely determines a distribution 
if 2 - 1/ - diverges. A rigorous proof is as follows: 

^ ft 

The characteristic function obeys the relation 


I 0 <M) (O I < , n > 1 


provided, of course, that v„ exists. A theorem of Denjoy 1 slates that if a func¬ 
tion/(a), defined in the segment (a, b), possesses derivatives of all orders therein, 


'Arnaud Denjoy, "Surlea fonctiona quaai-analytiquoa do variable rfielle," Camples Ren- 
dus Vol 173 (1921), p. 1399. 



CORRECTIONS 


465 


1 

if M n is the maximum of |/ Cn, (x) | in the segment and if 2 ^ l/n is divergent, 

then/(a;) is completely determined by its value and that of its derivatives at a 
single point. <p(l) obeys the conditions of the theorem and by taking the point 
to be t = 0, theorem 4 follows. 

I hope that this note will correct any misunderstandings that may have arisen 
on the main paper, and I regret that a number of circumstances, not the least 
of which is war, have made it impossible to forward the correction at an 
earlier date. 


ANNOUNCEMENT CONCERNING COMPUTATION OF 
MATHEMATICAL TABLES 

In the December, 1939, issue of the Annals of Mathematical Otndslic.s, p. 399, 
there appeared an Announcement of the Mathematical Tables Project. This 
project is operated by the Work Projects Administration of New York City, 
as 0. P No. 205--2 4)7-11 under the technical supervision of Dr. A, N. T.owan. 
It is sponsored by the National Bureau of .Standards, Dr. Lyman J. Briggs, 
Director. 

In order to keep the readers of the Annals up-to-date on the progress of the 
work of the Project, information will be released from time to time. 

The following list shows the status of work, us of October, 1941. The reader 
is referred to tin* December, 11(39 issue of the Annals with respect to which n 
will denote the n lh item of Tables Published, Pn will denote the a 11 ' item of 
Tables in Progress and Cn will denote the u th item of Tallies under Consideration. 

Tables published. 1, 2, 3, PI, P2, P3, P4, mb), Pti(e), P0(d), P(i(c), P7, 
07 and also 

1 , Table of Five-Point Lagrangian Interpolants for arguments ranging be¬ 
tween 0 and 2 at intervals of 0.001. 

2, Tables of Grid Coordinates (American Polyconic Projection) at 5 minute 
intervals of latitude and longitude for latitude from 70°N to 28°N and for lati¬ 
tude from 49°N to 72°N. 

3, Table for Map Projections of Northwestern Extension of II, K, 

Tables in process of reproduction. Pf>, Pti(a), PH and Cl for [0 (.001) 7 (.01) 
50 (.1) 300 (1) 2,000 (10) 10,(MX); 12/4) also 

1. Tables of Lection Moduli and Moments of Inertia for Structural Members 
used in Naval Architecture. (For the Bureau of Marine Inspection and 
Navigation.) 

2. Tables of Si(z) and Ci(z') for x ranging from 10 to 100 at intervals of 0.001. 



466 


COMPUTATION OF 'PAULKS 


3, The zeros of the Legendre Polynomials up to the Kith order to 15 decimal 
places and the Weight Coefficients for Gauss’ Mechanical Quadrature Formula 

Tables for which manuscripts are completed. P\), I’ll, ('(}, (the function x v , 
instead of A(x, y ), has been tabulated to 15 places), and also 

1 Table of [ Jq(Q dt from 0 to 10 at intervals of 0.01 to 10 places. 
h 

Tables for which computations are completed. /'!() (also tanh jt, cotli x), 
(72, C3, (change to n = —21, —20 - * • 0) and also 

1. Various hydraulic tables based on Kuttcr’s and Manning’s formulae. 
(Tabulation suggested by’ the War Department) 

2. Table of reciprocals of the integers from 100,(XK) to 200,000. 

3. Table of the Associated Legendre Functions Rn(r) and Q"u) for n ranging 
between 1 and 10, and rn between 0 and 4; for arguments x and i.r where x 
ranges between 0 and It) at intervals of 0.1. Also corresponding values for half- 
integral values of n and values of the. functions for arguments in degrees. (Tabu¬ 
lation suggested by National Defense Research Committee.) 

4. Tables of R sin 6 and R cos 0 R = 1000 (10) 10,000 , 0 - 5(5)81X1 (in 
mils). 


Tables for which computations are in progress. C3 (for n ■ 1, 2, • ■ 20) 

and also 

1 . Tabic of the Bessel Functions Y a {i) and l\(z) for the same complex argu¬ 
ments as in J o(z) and Ji(z), mentioned in P9. 

2 . Tables of Length of Meridional Are at one-minute intervals. 

3. Tables of the Confluent I-Iypergeometric Function for selected values of 
the parameters, 

4. Tables of three-point, four-point, six-point and seven-point Lagrangiun 
Intcrpolants 

5. Table of Tchebysheff Polynomials. 


Tables under consideration. C4 and also 

1 . Table of the first 10 powers of the reciprocals of the integers from 1 to 1,000. 

2 . Extensive tables of Elliptic Functions for both real and imaginary' 
arguments, 

3. A 12-place table, of Inverse Circular and Hyperbolic Functions other than 
Arc tan x. 


4. Table of the Integral [ V a (l) dt. 

Jo 

5. Tables of the non-periodic solutions of the Mathieu Differential Equation. 

6 . Table of the Error Functions for complex arguments (suggested by Federal 
Communication Commission), 

7. Tables of the Unit-Sigma Functions and their integrals. 



COMPUTATION OF TABLES 


M? 


8 . Tables of Circular Functions for Complex Arguments. 

9. Tables of the Zeros of the Hermit** and Daguerre Polynomial* aiui t4 the 
corresponding Weight Factors in Gauss' Mechanical Quadrature Formula 

10. Table of Lamd Polynomials, 

11. Table of Military Grid Coordinates for certain "Control Station* *’ 
the War Department.) 

12. Tables of the Chi-Square Distribution and "Student V" Mistribuhoi* 

13. Tabulation of Fisher’s A-, Ii-, ami C~ Dislributiona of the Multiple f VrrlR 
tion Coefficients. 

The Project would welcome suggestions for the computation of new ft»U« of 
interest in pure and applied mathematics, as well as information regarding com 
putational work in progress elsewhere. 

Communications should he addressed to Major Irving V. Huie, Admiindrator, 
Work Projects Administration, 70 Columbus Avenue, New York City, 

Requests for copies of published tables should he addressed to Dr t >n,»n J 
Briggs, Director of the National Bureau of Standards, Washington, D r 



REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 


The Fourth Hummer Mooting of tho Imdifufc of Mathematical Statistic* i\n.s 
held at The University of Chicago, Tuesday to Thun-dav, Sejitemliei '2 to 4, 
1941, in conjunction with the meeting" of the American Mathematical Society, 
the Mathematical Association of America, anil tin* KnmnmMrie Society The 
following sixty-eight members of (lie Institute attended (he meeting. 


It, L, Anderson, T. W. *\inli*r*wm. K J Arnold. II M H.»c«n, Waller Ihirtkv, W. I), 
Baten, A. A. Dennett, Paul Boarhim, I. W hurt, J II Dudley, \\ I 1 tVilerherg, W G 
Cochran, A. T. Craig, C. (' Craig, J. II Ctn(i«K. J F It.d>,W F fientO'-* 1 !, Ihioh. 
P L. Drcssel, P S. Pwyer, Churchill Kiwnhari, M. I* KUehark.H P l.v a >t .Fincher, 
W, C, Flaherty, It. M. Fouler, C H. Grave*, I.m»« Guifumii. W I Hmo. F C Hinds, 
A, S. Householder, E. V. Huntington, William IlurwiU, M H logr P.m.. 1 *m«h an .Jarknon, 
LcoKatJi,J F Kenney, L. A. Knowler, L. F. Kiuiilscji, Tj.iIIjiih Iveijun nc C 1 Kmwu’k, 
0. E. LftiiCRHler, I). II. Lenvena, H, A. Li'iigycl, W. t! Mitd*»w,,l \ Mudm >. A M Muml, 
J E Morton, I,eah Nftugh 1 , Unrotrl Niandenii. J. 1 Xoflluim, 1 G G3*K, (ivaicin (Ire, 
C, K. Payne, 0. A I). Prtnnreich, Francis H' g.in, ,% ll<y U««l*in*****». C I lt>«nr, M M 
Sandomire, Max Basuly, Henry KdielTe, 11 M Helivvnru. llarrv SiJl>*r, J 11 Smith. M. K 
Wescott, 8. 8. Wilks, E. W, Wihmn, dale Young 


The opening session, on Tuesday morning, was devoted to contributed jui|K‘rs 
on Probability and Statistics and was held jointly with the American Mathe¬ 
matical Society and the Econometric Society, The Chairman wu- Frofcow 
A. T. Craig, University of Iowa, ami tlu* following papers were presented: 


1. A geometric derivation of Fisher's s-lrnnsformtihm. 

J. B. Coleman, University of South Carolina. 

2. Large sample distribution of the likelihood ratio , 

Abraham Wald, Columbia University, 

<b On the integral equation of renewal theory, 

(Read by title.) 

Willy Feller, Brown University. 

4. Cumulative frequency functions. 

Irving Burr, Purdue University. 

5. On spherical probability distributions. 

K. J, Arnold, Massachusetts Institute of Technology 
0. Some observations on analysis of variance theory. 

(Read by title,) 

Hilda Gairingcr, Bryn Mawr College. 

7, On the asymptotic distribution of medians of samples from a mullimnate papulation. 
A. M. Mood, University of Texas. 

8 , A problem of estimation. 

J. F, Daly, Catholic University. 

Abstracts of these papers follow this report, 


On Tuesday afternoon a session was held jointly with the Econometric Society 
on Time Series Analysis, Under the chairmanship of Professor G. C\ Craig of 
the University of Michigan, the following papers were presented: 

468 



REPORT OF CHlCACiO MW.T 1 NH 


4 Ct 


1. In sampling ihfnry applicable hi rronotmr time ernntl 

Trailing Kunpman*, Perm Mutual Life Insurance Co, I'hiiadeijtl.ia 

2. Serial rinrilntum 

R L AmlciHim, \nr(li Carolina Shift’ College 

Th<i morning .session on Wednesday was held jointly with she I eoiintu" 'to 
Society on Curve Filling. Tht» chair was hold hy 1 If. 4 Mar-rhah o { tin 4 \> u 
School for Social Rcw‘arch and the following palters wort* presented 

1 Wrig/i(H (ii enmfirntnili 1 far Irnnsfoimalwti m rvflr filling 

T O Vntema, CniverHity of Chicago ami CouIoh f«mnuieti>*u 

2 Curve filling hi/ cumulative addition 

Jnlm II Smith, t'mvmity of Chicago amt Cowles f 

On Wednesday afternoon, Professor H. S. Wilks of Princeton 1 ‘iiiveruty a<r 1* 4 
as chairman of a session on Mullirnriulr Anntyxui. Tin* following p.tp> t . were 
read: 

i 

1 On tetn. ,g sets of mean i anil dutcriminanl ana /yam 
Ahralmm Wald, Culiimliia rtnvcrsify 

2. On text* of hyprithcvs conn ruing inriann <s nnil /intirjf/Hcri 
William C* Mntlmv, Ilmcaii of the cViihus 

The Josiuh Willard dibits Lecture of the Atnrrirun Matin umtu al Sru e U wa* 
delivered on Wetlnesdtiy evening by Pinfevuir Sew all Wnglit of the rtnvrmfy 
of Chicago. Ills topic was Slnlisltrnl l!<wttc> awl Fmlu'um, 

On Thursday morning a joint -essinn on Ih mntvi nwi .Supply AmF/w* wno 
held with the Keonotnefric Society, At fhis M"-rion Hr C V Rot*" i*i the In 
statute of A[iplied Kronnmotrie* presided, and flu* billow mg paper* were 
presented: 

1 Demand nnalynin fur certain eommndilxrx ha-rd an im*<>w,r and budge! d d i 

J. Mttruchuk, New Helmol fur Sm-ud Research, amt Ci or#* f »*»r vr’> „ H.irsau 

of Keoiinniic Reneurcli. 

2. DerivtUwn tif tlnthciUttt uf demand and supply A dims nulh.nl 
Oaear Lunge, I’nivcrsily of Chicago ami C>*wI«m Cominismou 

3. On Ihe uorkingx of a general nimhhrium sysfrm 

J. L. Momik, l’u i vend tv of Chicago arid Cowl«*st l Vmmmn'oH 

An informal reception was held on Monday evening in tin* Judfwon C ‘r*url 
Lounge. Oti Tuesday and Wednesday afternoons the Indie** of the Mathematic# 
Department of the University of f 'hirago served tea in the Erkhart Hall Common 
Room. After the joint session on Tuesday afternoon, the Cowl*** Commotion 
for Research in KmtmmicH gave a tea m the Common Room of the Hrirore 
Building. (In Thursday evening a joint flintier of the four mafhemafir.nl organi¬ 
zations was held in Hutchinson Commons, preceded by an informal reception 
at the Reynolds Club, 

Kuwin (*„ th.t«, 

Fftrtinry 



ABSTRACTS OF PAPERS 

(Presented on September 2, 1941, at the Chicago Meeting of the Institute) 


A Geometric Derivation of Fisher’s z-tranEformation. J. B. Colem \n, Uni¬ 
versity of South Carolina, 

In fitting points in a plane by a line ao that the sum of the squares of the perpendicular 
deviations shall be a minimum, a second line is found for which the sum of the squares of 
the deviations is a maximum, bet 2d* be the Bum of the squares of the deviations of the 
points from the minimum line, and 2/;* be the sum of the squares from the maximum lino. 
Then 2D' l /2d 1 - (1 + r)/(l - r), t log (1 + r)/(l - r) is Fisher's i-lruimformiilion for test¬ 
ing the coefficient of correlation. 


Large Sample Distribution of the Likelihood Ratio. Abraham Wald, Columbia 
University. 


The large Bamplo distribution of the likelihood ratio has been derived by fl. fi, Wilks 
(AnnaU of Math Stal., Vol. 0 (103A>) in case of a linear composite hypothesis and under 
the assumption that the hypothesis to be tested is true. Here a general composite hy¬ 
pothesis is considered and the distribution in question is derived also in ease that the 
hypothesis to be tested is not true. Let/Cm - ,x, , 6 1 , • - ■ , 9*) be the joint probability 
density function of the variates xi , ••• , x, involving k unknown parameters 9, , • , 9*. 

Denote by H„ the hypothesis that the true parameter point 9 « (0, , ■ • ■ , 9*) satisfies the 
equations fi(9) ■*■•••« f r (9) - 0, (r £ k ), Denote by A* the likelihood ratio statistic for 
testing H v on the basis of n independent observations on x\ , - • ■ , x p . For any parameter 

point 9 let te(9) - and let c</(9) be the expected value of * 1 ' -— jJ&jJI, 


d log/(*i 


I 


0 ) 


89/ 


calculated under the assumption that 9 is the true parameter point. 


For any 9 denote by A(9) the matrix || {<,(8) || (t «* 1, ••• , r; j «•> 1, ■. • , k) and let 
||<r</(8) II - l|c./(9) II" 1 , (*,j ” 1, ••• , Jb). Let furthermore ||«tf,(9) j|, (u, v *> 1, • ,r) 

be the matrix equal to the product A(9)> || a,,(9) |[• A(5), where ,/i(9) is the transpose of 
A(9). Finally let || || « j| <rj,(9) II -1 , (u, e 1, , r). For each n and 9 denote 

by yin(0), • ■ , ia»( 9) a sot of r variates which have a joint normal distribution with mean 
values y'nfiW), , \/nb(0) and covariance matrix |[«r!,.(9) ||, (u, i> «■ 1, - , r). De¬ 


note the quadratic form 



by Q*{$). 


It baa been shown that under 


certain assumptions on f(x\ 9), $i(9), , f,(9) we have lim \{ 4 {-~2 log K < 

11 6) - P[Qn{9) < t | 9]| “ 0 uniformly in t and 8, where for any * F(* < 1 ! 8) denotes the 
probability that z < l holds under the assumption that 9 is the true parameter point. The 
distribution of Q„(9) is known and has been treated in the literature. H tL is true, then 
{,(0) - ■. • - £r(0) “ 0, and Q„(9) has the x* distribution with r degrees of freedom, 


On the Integral Equation of Renewal Theory, 


As is well-known, the equation U(£) 



W. Feller, Brown University. 
U(l ~ x) dF(x) has frequently been 


470 



ABSTRACTS Or 1'APERH 


471 


discussed, under different forms, m connection with the imputation theory, the theory of 
industrial replacement, etc. In tin* present j»a|x*r it is shown that, using Tmitixnnn 
theorems for Laplace integrals, it becomes possible to analyje in detail the asymptotic 
lii'lmvior of !'U) n» t - * * find also to solve mime oilier problems which have been dtnrUMrd 
in the literature, Klriet enmiiliorm for the validity of different methods to treat (he rqiin 
turn are given together with some modifications found to ta* neee>mnry The paper will 
appear m the Annah nf Mathematical Slaliahnt. 

Cumulative Frequency Functions, I. W, Bruit, Timlin* I'lim-mity. 

Frequency and probability funelions play a fundamental role in statistical theory and 
practice. They ate, however, often inconvenient and difficult to use, inner it is necessary 
to integrate o> sum to find the probability for a given range Theoretically the cotnulait ve 
or integral frequency function would seem to be better adapted to determining such prob¬ 
abilities, since the latter can be found simply by a subtraction The aim of this pajier i« 
to make a contribution toward the direct use of cuimdative frequency functions Home 
general piuperties and I henry of cumulative functions arc presented with particular empha- 
Bis upon certain moment functions adapted to such direct use Both continuous ami dis¬ 
crete rimes are included. A tint of possible cumulative functions is given ami a particular 
one, F(r) *=» 1 — <1-4- Jf) * *, diacuaacd fully Tina function has jimjicrUrs whirl) make it 
practicable and adaptable to a wide variety of distribution tyjs-a It well illustrates the 
poHsihilitica of the cumulative Approach. 

On Spherical Probability DiatributionB. Kenneth ,1, AitNm.ti, M&asarhuwdta 

Institute of Technology. 

Two methods of correspondence for circular distributions to the normal error fniic'ion 
Imvo led to non-constant absolutely continuous functions jH, e F Zermke'n article m //and. 
buch tier l‘hyaik Vol. 3, pp.'*l77 -17K| The eorrespomling distributions for the sphere are 
found. The case of diumctricnl sy nunetry for both circle and sphere is rhaciWJwut Tables 
of the probability integrals involved are given and an application in geology is included 

Some Observations on Analysis of Variance Theory, Hilda Geimnoeh, 
Bryn Mawr ('allege. 

The test functions used in analysis of variance present themselves wt different rh»#c* 
of important problems. Their distribution has Ikuui determined and tabulated by H A, 
Sishcr 1 under the hypothesis that the chalice variables are all tiukprmUnl of each other and 
subject to the acme normal law. Consequently we can in this way tent only the hypothesis 
that the thcoioticu) populations him* all these properties. 

If it is not possible to determine the exact dintriluilion of test functions under aulhrirrilly 
general assumptions regarding the populalions we may: ia) find an asymptotic solution of 
the problem, i e. determine the distribution of tin* test functions/or large earn plea * Or (b) 
determine at leant the mathematical cxitertaiioim and ((,<■ variances of the teat functions 
for appropriately general popnlntioiiH and for xmutt ttampk*. 

It. is well known that the (•\|M-cl;tiioioi of the two quadratic forma which are basic in the 
analysis of variance are et/ual, even if the n population*, arc not normal but equal to each 
other (Bernoulli series). But, in addition, we can prove the iiiaih'iiiaticnl theorem that, 
under the same conditions the rxprrititititi of ihetr </ no hr at npmh one The next nti-pcon* 
Hials in studying the case that the n distributions arc not equal to each other and to investi¬ 
gate certain incqualitied characteristic for the Lexis Senes and I'masoii Series These 
different, criteria art; completed by the evmpulutiun of the vnrintirrs of the lest functions. 

> "Matron," Vol, 5 (likill), p. SKMCHL 

1 See e g. W. CL Mudovv, Amuih nj Math. Slat., Vol. U tHMil), p. 103 



472 


ABSTRACTS C)K PAPERS 


In addition to the above mentioned tent functions known ns "variance within" and 
"variance among ' 1 classes other symmrlriral teat functions have been considered in the 
classical analysis of variance. Here again we may assume quite grmml populniwm. It 
results that the Lexis aa well as the Poisson Kories may now lie characterised by rquahliett 
(instead of inequalities). 

Finally Ltsecms to he worthwhile to omit the assumption of uidejieriderit chance variables 
and to study different kinds of mutual dependence These mvi" 4 igution« lead to new in¬ 
structive inequalities among (hr expectation!!. These hint considerations seem to be con¬ 
nected with Fisher’s "intrarhisp correlation" and to supplement this idea 


On the Asymptotic Distribution of Medians of Samples from a Multivariate 
Population. A. M. Moon, I’niversity of Texas, 

Lot two variates -c, and x 2 have a density function/Ir, , i,) winch, besides being positive 
or zero and having its integral over the whole spare equal to 0111% shall satisfy these ren¬ 
ditions: 




, 0) rf*, n ^ 

JTj) fix -1" U 


The coordinate system is assumed to have been chosen so tlnit the population median is at 
the origin. Let fii , i 3 ) be the median of a sample of 'Jn -1 1 elements drawn from a popula¬ 
tion with this density function It is shown that for huge samples »/, . j-,1 is normally 
distributed to within terms of order I/y n with zero menus nod variances sod envariutipca 
given by certain integrals of/bn . Xj), 

A similar result is true for k as well us two variates. 

A Problem in Estimation, Joseph F. Dai.y, The Catholic University of America, 

Consider a normal population in which each individual is ehiiraeierized by the variates 
2/1 1 * ■ * , y v , y pp 1 , y Pl ., . Huppose that the latter two are not directly observable, but that 
for given values of y Pf i , y vx i the first set of y’s is independently distributed about the 
''regression line" j/t «> y Pt . 1 + ky p ,t Ik «=■ 1, • • • , p) with a enmriioit variance »*, For each 
individual, one can thus determine values y pr 1 , from the observed j/t , >* , j/,,, using 
the method of least squares. Assuming a similar relation between tbc eS|«‘Hed values of 
yV p+i in the original population, these estimates f/ P , 1 , f/ pt3 are, of course, unbiased. 
However, if wo calculate these j/'s for each individual of a sample of ,V, nod substitute them 
in the Pearson product-moment correlation formula, the estimate of the correlation be¬ 
tween y p+i and j/j>h thus obtained is somewhat biased. The bias depends on the number of 
observable y’s, and on the size of the variances and eovnrtaners of i/ Ff , , y t , 9 relative to o’ 

Is Sampling Theory Applicable to Economic Time Series? T. J. Koopmanh, 
Penn Mutual Life Insurance (lompany. 

The classical regression theory assumes that tlm values of the independent variables 
remain the same in repeated samples, Cerium situation* trt economic analysis, like price 
formation according to the "cobweb” theorem, require a sampling theory uf serial regression 
in which certain observations may represent a dependent variable at one time and an inde¬ 
pendent variable at a later time. This leads to the problem of the joint distribution of 
certain quadratic forms in normal variables. 

The simplest problem of this type is that of the distribution of the ratio r - q/p of a 
quadratic form q in T observations from a normal distribution with mean 0 to the sum p 



AHfSTKACTft or I’AI’KHH 


of the flquBH’H of these observation*. The distributon, of r m iiulritend^nf ,.f i),«t <f V 
aikI in 


wlu-re the A, are the eharrirtcmtir values «.f while (hr path of .n'egrafim, , rt „ 
fenm r I II rough (lie lower half of the complex plane to fl ,,o«nl or, the real , „<.<4v 
k, mid from there returns to r throujth rhe upper 1m If.plane 

In testing for the presence or absence of serial correlation „.,r regro>om, *, } ,« tu, . 
pinductft of micmmive observation*, ami k, m *t cm M/(7’ t 1),' If., 
disenue value* the above integral by a continuous variable of mmihr tlmfribu'iw 
following uppioMmution to the distribution of r m found: 


oe* <1 d* 



CONSTITUTION 
OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Piiufomc 

1 , This organization shall is 1 known as the Institute of Mathematical Statistics. 

2. Its object slmll la- to promote: the interests of mathematical statistics, 

ARTICLE II 

Membership 

1 . The membership of the Institute shall consist of Members, Follows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall Ik?, ful the Fellows, and do all others who 
have been members for twenty-three months prior to the dale of voting. 

ARTICLE III 

Officers, Board of Directors, Committee on Membership, inii Committee on 

I’rnucATioNS 

1 . The Officers of the, Institute shall Iks a President, two VieeT’residciits, and a Scene 
tary-Treoaurer, elected for a term of one year by a majority ballot at the annual minting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall Ih« elected by a majority vote of the 
individuals presont at the organization meeting, and shall serve until licmulter HI, lOHfi. 

2 . The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3. The Institute shall have a Committee on Momlxsnthip composed of thrt*c Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve ns the Cmmnitits' on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years. The president shall designate one of the Vice-Presidents ns 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications eom|mwKl of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President os Ex Officio Chairman of this Committee. 




ARTICLE IV 
Meetings 

1 , A meeting for the presentation and discussion of papers, for the election of Officers, 

Institute shall b e held annually at such 
ay be called from 


d for the transaction of other buain 
ff $Fthe, Bijard of Directors may deSl^i^l Additio; 






THE ANNALS 

OF MATHEMATICAL STATISTICS 

ElMTED BT 

& S. WILKS, Editor 

A. T. CRAIG J. KSYMAN 


H. C, Carver 
H. Cham£h 
W. E, Dewing 
G. Darmois 


mm THE COdPKSATrON dir 
R. A. Fmuutt 
T. C. Fry 
H, Hotelling 


B. von Mima 
E- S. Pearson 
H. L, ItlBTX 
W- A, SlIKWUART 


r “ r Mathematical Statistics is published quarterly hv fh« 

fofititWj of Mathematical Statistics, Mt. Royal & Guilford Aves. Balttamr* 

m.mti- ^ bfi0inptl °" 9 ' ™ nowal}5 > or<lei * for back numbers and other business com¬ 
munications should be sent to the Annals of Mathemattcal Statistics. Mt 

t ^‘ ve&,> ® a ^^ more » Md., or to the Secretary of the Insti* 

jstes «•—-— - •«; 

. gesr^assasarass 

? thc •*! of tbe ***> ***** in foot* 

teLtetete“ p in * •* 

aSSmS teSS ft -° 1 ‘ ly f 1 «"'“>'•• rear »prifa wittout 

I furnwhed ff ee* Additional reprints and covers furnished at coat. 

te« y ~- *+•***»■ 

5jf v? 5 5 ; 00 a ®f ct ' Skgfe numbers $1.50, 

Yols. V to date 94.00 each, Single numbers @ 1 , 25 , 


0 o ^?fJl , Lt ND *«***» at vhb 
WAVBBLY, PRESS, Iso. 
BABWKoaa, Mx>. ( tr. s, A. 




